1. Trang chủ
  2. » Ngoại Ngữ

Vietnamese-English Cross Language Search Information Retrieval (CLIR) - Discovering Noun Phrases for Translation

23 138 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 23
Dung lượng 1,29 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Vietnamese-English Cross Language Search Information Retrieval CLIR Discovering Noun Phrases for Translation -CSC 177 Presentation... Motivations – Unknown Translations• Brand names, Pla

Trang 1

Vietnamese-English Cross Language Search

Information Retrieval (CLIR) Discovering Noun Phrases for Translation

-CSC 177 Presentation

Trang 2

Motivations

Crosslingual Query

Noun phrase translation extraction

Experiments and results

Conclusion and next steps

Trang 3

Motivations – Unknown Translations

• Brand names, Place names, Personal names

• Titles (music, book, video)

• Terminologies (Science, Computer, Medical, Space,

Farming etc)

• Meaning might not be inferable from individual

components

• Might required expert knowledge for translation

• Might have multiple correct translations

• Cross-language Information Retrieval (CLIR)

• Machine Translation (MT)

• Machine-Readable Dictionary (MRD)

Trang 4

software)

Trang 5

Quang Dung)

Trang 6

Searching the web for translation?

• Parallel Data on the Web:

Vietnamese to EnglishTranslation

Trang 7

Searching the web for translation?

• Comparable corpus on the web:

Trang 8

Searching the web for translation?

• Mixed language web pages:

EnglishTranslation

Trang 9

Our Approach

• Extensions to CMU’s Ying Zhang 2005 paper (Credit)

• Addressing issues focusing to Vietnamese-English OOV translations

• Proper name translation is using pattern recognition technique and not by phonetic similarity and string

alignment

• Detection of borrowed English words

• Improving translation suggestions by utilizing

contextual information

Trang 10

Crosslingual Query to Obtain Mixed Languages

WebPages

• Extend the source query, VS , with extended

words/phrases VEX: (tend to frequently co-occur)

Trang 11

How to Find This VEX ?

• Find co-occurred terms in web

log

• Use co-occurred terms in search

query (in CLIR)

• Search Google, with VS, and

select Vietnamese words, VEX,

with high frequency

Overture Search Log

Trang 12

Original Source Query

Trang 13

Crosslingual Query

Trang 14

Our Approach: Noun Phrase Translation

Trang 15

Yahoo Search API - XML Data Returning

Snippet

Trang 16

Proper name recognition & Transliteration

• Extract and concatenate Title, Summary, and URL

• Recognize that proper name text pattern

is likely to appear in capital with the

first letter

• Compute the likelihood of a query text is a proper name

TextSnippet

in V

ofoccurencesAll

TextSnippet

in )Ver_In_Cap(

First_Lettof

Occurences(

• Suggest a translation candidate VN: Quang Dũng → Eng: Quang Dung

• Compute and assign a weight to a translation candidate

Trang 17

Preprocessing (Query: Thuật toán genetic)

– Extracting and concatenation of Title, Summary, and URL

Thuật toán-Cấu trúc dữ liệu (Reserve Polish Notation – RPN), một thuật toán "kinh điển" trong lĩnh vực trình biên dịch THUẬT GIẢI DI TRUYỀN – GENETIC ALGORITHM -

Kỳ 2 ity.vnuit.edu.vn/thuattoan/index.htm

– Mark query, normalize text, remove noise text

~123456789 cấu trúc dữ liệu reserve polish notation – rpn một ~123456789 kinh điển trong lĩnh vực trình biên dịch thuẬt giẢi di truyỀn – ~987654321 algorithm kỳ 2 ity

vnuit edu vn thuattoan index htm

– Mark recognized Vietnamese text with VNW tag

~123456789 VNW VNW VNW VNW reserve polish notation VNW rpn ~123456789 VNW VNW trong VNW VNW VNW VNW VNW VNW di VNW VNW ~987654321 algorithm

VNW ity vnuit edu vn thuattoan index htm

– Group continuous English words and build word list

['~ 123456789 ', 'VNW', 'VNW', 'VNW', 'VNW', '', '', 'reserve_polish_notation', 'VNW', 'rpn', '~ 123456789 ', 'VNW', 'VNW', 'trong', 'VNW', 'VNW', 'VNW', 'VNW', 'VNW', 'VNW', 'di', 'VNW', 'VNW', '~ 987654321 ', 'algorithm', 'VNW', 'ity', 'vnuit', 'edu', 'vn', 'thuattoan', 'index',

Trang 18

,(

1(

• Example: Thuật toán genetic

Trang 19

Contextual Ordering Model &

Result Ranking

• Estimate Closeness Probability

• Overall Score for each candidate

• Sort score and present top 5 suggestions

=

e c e E c E

EX eE

ADJ eE

(

) (

) (

) ( )

ADJ e

(

Trang 20

Sample Program Output # 1

(dân ca -> folk or traditional music)

Trang 21

Sample Program Output # 2

(Quang Dũng -> Quang Dung)

Trang 22

Sample of Translation Results

Category Vietnamese

Phrase/Word Vietnamese-English Web-mining

Translation

Vdict (Machine Translation)

Vietdict (Online Dictionary)

Organization

Name WTO là gì? What is world trade organization ? What is WTO? No definition found

Science & Tech thuật toán di

truyền Genetic algorithms Heredity algorism No definition foundLocation Name Thừa Thiên Huế Thua Thien Hue Partial Excess Hue No definition found Person Name ca sĩ Quang Dũng Singer Quang Dung Optical singer Dũng N/A

Medical Term viêm màng não Meningitis brain

infection meningitis No definition foundGeographical

name Đại dương Bắc Băng Dương Arctic ocean Đạtôi glacial ocean Boreal Yang No definition foundEducation học vị

Tiến sỹ Phd degree Advance academical degree sỹ No definition foundMusic dân ca Folk music folk-song folk-song

Music nhạc hip hop Hip hop music or

Rap music music cây hu-blông hông No definition foundSpace phi hành gia Sally

Ride Former astronaut Sally Ride air-man sự Phá vây cưỡi No definition foundPlant cây kiểng vườn

Nhật Bonsai Japanese garden Japanese garden plant kiểng No definition foundFarming những nghề cá

thủy sản

Aquaculture fisheries seafood fisheries No definition found

Laws cư trú thường trực permanent resident populate permanent No definition found Astrological Thuật chiêm tinh

phong thủy feng shui astrology Geomancy astrology Geomancy

Trang 23

Conclusion and Next Steps

• Contributions

– Recognize and translate important phrases

– Translate: persons, locations, concepts

– Low cost for implementation with reasonable

performance

• Future work

– Experiment with a larger set of test data

– Integration with Vietnamese-English CLIR work – Automate the generation of extended

words/phrase to derived English extended word

– Experiment on “Refine Result” concept for

Ngày đăng: 27/08/2017, 00:19

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w