1. Trang chủ
  2. » Luận Văn - Báo Cáo

Báo cáo khoa học: "DEALING WITH INCOMPLETENESS OF LINGUISTIC KNOWLEDGE IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT" potx

8 326 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 471,07 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT Makoto Nagao, Toyoaki Nishida and Jun-ichi Tsujii Department of Electrical Engineering Kyoto Unive

Trang 1

IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT Makoto Nagao, Toyoaki Nishida and Jun-ichi Tsujii Department of Electrical Engineering

Kyoto University Sakyo-ku, Kyoto 606, JAPAN

I INTRODUCTION Linguistic knowledge usable for machine trans-

lation is always imperfect We cannot be free from

the uncertainty of knowledge we have for machine

translation Especially at the transfer stage of

machine translation, the selection of target lan-

guage expression is rather subjective and optional

Therefore the linguistic contents of machine

translation system always fluctuate, and make

gradual progress The system should be designed to

allow such constant change and improvements This

paper explains the details of the transfer and gen-

eration stages of Japanese-to-English system of the

machine translation project by the Japanese Govern-

ment, with the emphasis on the ideas to deal with

the incompleteness of linguistic knowledge for

machine translation

2 DESIGN STRATEGIES

2.1 Annotated Dependency Structure

The intermediate representation we adopted as

the result of analysis in our machine translation

is the annotated dependency structure Each node

has arbitrary number of features as shown in Fig i

This makes it possible to access the constituents

by more than one linguistic cues This representa-

tion is therefore powerful and flexible for the

sophisticated grammatical and semantic checking,

especially when the completeness of semantic analy-

sis is not assured and trial-and-error improvements

are required at the transfer and generation stages

2.2 Multiple L~aver Grammar

We have three conceptual levels for grammar

rules

lowest level: default grammar which guarantees the

output of the translation process The quality

of the translation is not assured Rules of

this level apply to those inputs for which no

higher layer grammar rules are applicable

kernel level: main grammar which chooses and gener-

ates target language structure according to

semantic relations among constituents which are

determined in the analysis stage

topmost level: heuristic grammar which attempts to

get elegant translation for the input Each

rule bears heuristic nature in the sense that it

is word specific and it is applicable only to

some restricted classes of inputs

2.3 Multiple R e l a t i o n Structure

In principle, we use deep case dependency structure as a semantic representation Theoreti- cally we can assign a unique case dependency struc- ture to each input sentence In practice, however, analysis phase may fail or may assign a wrong structure Therefore we use as an intermediate representation a structure which makes it possible

to annotate multiple possibilities as well as mul- tiple level representation An example is shown in Fig 2 Properties at a node is represented as a vector, so that this complex dependency structure

is flexible in the sense that different interpreta- tion rules can be applied to the structure

2.4 Lexicon Driven Feature Besides the transfer and generation rules which involve semantic checking functions, the grammar allows the reference to a lexical item in the dictionary A lexical item contains its spe- cial grammatical usages and idiomatic expressions During the transfer and generation stages, the~e rules are activated with the highest priority This feature makes the system very flexible for dealing with exceptional cases The improvement o f translation quality can be achieved progressively

by adding linguistic information and word usages in the dictionary entries

2.5 Format-Oriented Description o f Dictionary Entries

The quality of a machine translation system heavily depends on the quality of the dictionary

In order to build a machine translation dictionary,

we collaborate with expert translators We develop-

ed a format-oriented language to allow computer- naive human translators to encode their expertise without any conscious effort on programming

Although the format-oriented language we developed lacks full expressive power for highly sophisticat-

ed linguistic phenomena, it can cover most of the common lexical information translators may want to describe The formatted description is automati- cally converted into statements in GRADE, a pro- gramming language developed by the Mu-Project We prepared a manual according to which a man can fill

in the dictionary format with linguistic data of items The manual guarantees a certain level of quality of the dictionary, which is important when many people have to work in parallel

Trang 2

J - C A T = V e r b

J - L E X ffi I t ~ "f ~ ( i n c r e a s e )

J - D E E P - C A S E = M A I N

J - G A P f f i ' ( S O U r c e G O A l ) '

J - S E I ~ W E N C E - C O N N E C T O R = D E C L A R A T I V E

J - S E N T E N C E - R E L A T I O N = N I L

J S E h ~ I ' E N C E - E N D © N I L

J - D E E P T E N S E = P R E S E N T

J - D E E P A S P E C T = B e y o n d T i m e

J D E E P M O D E = N I L

J.VERB.ASPECT ffi TRANSITIVE J-VERB.INT = N O

J-VERB-PAT='(~: ~." ~' :~ I "C" :: )' J-VERB-SD ~'(~ ~ -SUBject T-CAUse )' J-NEG = N I L

J - C A T = No'tin

J - L E X = ;~ ~ ( a d v a n c e )

J D E E P - C A S E ffi C A U s e

J S U T ' F A C E - C A S E ffi ~'-

I

I -CAT= N o u n

(electronic i n s t r u m e n t s t i o n ) D E E P C A S E ffi SUBject

: S U R F A C E - C A S E = -'9

J - C A T = N o u n

J - £ Z X ffi N I L

J - D E E I ' - C A S E = S O U r c e

J - S U R F A C E - C A S E = ~, ,%

J - C A T f N o u n

J L E X ,= g~ I b ~ ( [ ' ~ ( a n t o m a t a d s h i p )

J - D E E P - C A S E ffi S U B j e c t

J-SURFACE-CASE = ~'

J - B K K - L E X = ~:

J - N F f f i N I L

J - D E E P - B F K I - 3 = N I L

J - S U R F A C E - B F K I - 3 ,= N I L

J - B F K - L E X 1 - 3 , , N I L J-N,ffi C.,ommonNotm

J S E M , , O M ( u r t h S e i n l o b j e c t )

J - N U M B E R = N I L o

I

J-CAT = Noun

I

J-LEX = NIL

J - S U R F A C E - C A S E ='(( " T" ) (:=))"

d u m m y nodes

Fig i R e p r e s e n t a t i o n of a n a l y s i s r e s u l t by features

work w o r k

I I [ J - L E X = he

agent O R possess ~ |J-DEEP-CASE l

I - - I L = agent O R posessj

Fig 2 An e x a m p l e of c o m p l e x d e p e n d e n c y s t r u c t u r e

3 O R G A N I Z A T I O N OF G R A M M A R R U L E S F O R T R A N S F E R

A N D G E N E R A T I O N S T A G E S 3.1 H e u r i s t i c R u l e F i r s t

G r a m m a r r u l e s a r e o r g a n i z e d a l o n g the p r i n c i -

p l e that "if b e t t e r r u l e e x i s t s then the s y s t e m uses it; o t h e r w i s e the s y s t e m a t t e m p t s to use a

s t a n d a r d rule: if it fails, the s y s t e m w i l l use a

d e f a u l t r u l e " T h e g r a m m a r rule i n v o l v e s a n u m b e r

of s t a g e s for a p p l y i n g h e u r i s t i c rules Fig 3 shows a p r o c e s s i n g f l o w for the t r a n s f e r and g e n e r -

a t i o n stages

H e u r i s t i c rules are w o r d s p e c i f i c G R A D E m a k e s

it p o s s i b l e to d e f i n e w o r d s p e c i f i c rules Such rules can be i n v o k e d in m a n y ways F o r e x a m p l e , w e can a s s o c i a t e a w o r d s e l e c t i o n rule for an o r d i n a r y

v e r b in a d i c t i o n a r y e n t r y for a noun, as s h o w n in Fig 4

Trang 3

terna•l P re-transfer ~ post-transfer

phrase

s t r u c t u r e , s /

transformation

MORPHOLOGICAL SYNTHESIS

(a) Activating a Lexical Rule for a Noun " ~ J ~ ' ( e f f e c t ) from a Governing Verb "+~ + "(give)

J-LEX= ~- ~ ~ (five) TRANSFER

J-N- KOUSETSU = ~ ~-KOUSETSU-TRANSFE R I : : 2

/

*'~ / " < V E R B > : A ~ , ~ £ ~

I

(b) Form-Oriented Description of a Transfer Rule f o r a Noun "~J~m~'(effect)

~- EFFECT +-~>~

[ f t l & + t | I ' [ I++.~

+'+ + + ' ~ , i

s I I t 6

t i = ~

I~FF~CT)TE

I F t P p t C T I T E

I

I

I IPE ¢

!~! ~ u a 3 0 8 ;

a~T o o J

I

" t O O ~ 0

• = ~ ^ c • I ~U=G ~ l AnG +

/ z e (

I

-!

|!

Some heuristic rules are activated just after the standard analysis of a Japanese sentence is finish-

ed, to obtain a more neutral (or target language o r i e n t e ~

such invocation the pre-

pragmatic interpretation are done in the pre-transfer

rules are applied in this loop, the better result will

show some examples

Target Language by Using Semantic Markers Word selection in the target language is a big problem in machine transla-

of choices of translation for a word in the source

adopted in our system are, (i) Area restriction by using field code, such

as electrical Engineer- ing, nuclear science, medicine, and so on (2) Semantic code attached

to a word in the analy- sis phase is used for the selection o f a p r o p e r target language word or

a phrase

(3) Sentential structure of the vicinity of a word

to be translated is sometimes effective for the determination of a

in the target language Table i shows examples

of a part of the verb trans-

of English verb is done by the semantic categories of nouns related to the verb The number i attached to verbs like form-l, produce-

2 is the i-th usage of the

information of nouns is not available, the column indi- cated by ~ is applied to

invocation of grammar rules

Trang 4

J.CAT=Verb

J-LEX = ~" ~, Ido not h a v e )

J-LEX = ~'~(sense)

{

J-CAT= Noun

I

"~ J-CAT = Al)Jamtive

{ J - L E X = ~ ~ " ~ ~ Cmeaning{ess)

= e x p r e ~ i o n which d ~ s not h a v e s e r e " ~ " m e a n i n g l e ~ e x p r e ~ i o n "

Fig 5 An example of a heuristic rule used in the

pre-transfer loop

~>-~:few

!

(to be determined

X

- ~ ( + t e n d tO)

/ A

In most cases, we can use a fixed format for describing a translation rule

ber of dictionary formats specially designed for the ease of dictionary in- put by computer-naive expert translators The expressive power of format- oriented description is, however, insuf- ficient for a number of common verbs such as " ~ ~ " (make, do, perform ) and " ~ ~ " (become, consist of, provide,

of usages are to be listed up with their corresponding English sentential struc- tures and semantic conditions

The transfer stage bridges the gap between Japanese and English expressions There are still many odd structures after this stage, and we have to adjust further more the English internal repre-

call this part as post-transfer loop

An example is given in Fig 8, where a Japanese factitive verb is first trans- ferred to English "make", and then a structural change is made to eliminate

it, and to have a more direct expression

Postpositions Postpositions in Japanese general-

postposition, however, has different usages, and the determination of English prepositions for each postposition is

verb which governs the noun phrase hav- ing that postposition

Table 2 illustrates a part of a default table for determining deep and surface case labels when no higher level

this way, we confirm at least one trans-

particular usage of preposition for a particular English verb is written in the lexical entry of the verb

Structures in Target Language

Fig 6 Examples of pre-transfer rules

Trang 5

structure

action,deed,movement occur-i reaction

form X(obj)

X take place

X occur standard,property

produce X form-I

non-living substance

produce-2 improve-i

x produce Y property

Semantic marker for X/Y

X improve Y

X increase Y

Table i Word selection in target language by using semantic markers

~ (NARO)

(1) A ~rS ~'~ ~

: : _ _ _ >

~r

/ \ , / \

/ \

(suB) (GOAL)

provide

reach

turn

become

(3) dictionary rules

of popular verbs

Grobal sentential structures of Japanese and English are quite different, and correspondingly the internal structure of a Japanese sentence is

difference from Japanese internal representation

to that of English is absorbed at the (pre-, post

generation, some structural transformations are still required in such oases as (a) embedded sentential structure, (b) complex sentential structure

We classified four kinds of embedded senten- tial structures

(i) a case slot of an embedded sentence is vacant, and the noun modified by the embedded sentence comes to fill the slot

the semantic properties like parts, attributes, and action

(~i~)The third and the fourth classes are particular embedded expressions in Japanese, which have

" g ~ , P " (in that), and so on

An example of the structural transformation

is generated after the structural transformation Connection of two sentences in the compound and complex sentences is done according to Table

After the transfer is done from the Japanese deep dependency structure to the English one, conversion is done to a phrase structure tree with

processes explained in 4.1 and 4.2 are involved at

ed top-down from the root node of the dependency

demands a noun phrase expression or a to-infinitive expression to its dependent phrase, the structural

verb transformation, and noun to adjective

Trang 6

~ ~ ~ m a k e

I ( C : i n t r a n s i t i v e ( c o n s u l t a t i o n to verb) l e x i e a l i t e m C)

> C'

!

( C ' : t r a n s i t i v e v e r b d e r i v e d

d e r i v e d from C)

A ~ I " 8 t I ~ -~A m a k e B r o t a t e > A r o t a t e B

Fig 8 An e x a m p l e of p o s t - t r a n s f e r r u l e a p p l i c a t i o n

T a b l e 2 D e f a u l t r u l e for a s s i g n i n g a c a s e l a b e l of E n g l i s h to a

J a p a n e s e p o s t p o s i t i o n " l~ " (ni)

J A P A N E S E E N G L I S H

S E N T E N T I A L S E N T E N T I A L

C O N N E C T I V E D E E P - C A S E C O N N E C T I V E

R E N Y O

( - S H I ) T E

R E N Y O

( - S H I ) T E

- T A M E

- N O D E

- K A R A

- T O

- T O K I

- T E

- T A M E

- N O N I

- Y O U

- Y O U

- K O T O N A K U

- N A C A P ~ ,

- B A

T O O L

T O O L

C A U S E

T I M E

P U R P O S E

II

MANNER

A C C O M P A N Y

C I R C U M S T A N C E

B Y - I N G

B Y - I N G

B E C A U S E

¢!

s!

W H E N

S O - T H A T - M A Y

to

A S - I F

W I T H O U T - I N G

W H I L E - I N G

W H E N

, ° ,

T a b l e 3 C o r r e s p o n d e n c e of s e n t e n t i a l c o n n e c t i v e s

h e school r e s i g n r e a s o n

N 1 N 2 V N 3 [ANALYSIS] r e a s o n ( N 3 ) , \

resign(V) /

-~ / m I " ~ ' ~ O e

i l s c h o o l ( N 2 ) r e a s o n

h e [TRANSFER]

73 , P R O P C A U S E

i

N 1 N 2 (N 3) [GENERATION] N P

N 3 R E L C L REL/~V S

,

why

Fig 9 S t r u c t u r a l t r a n s f o r m a t i o n of an e m b e d d e d

s e n t e n c e of t y p e 3

Trang 7

,ANALYSIS] ~i [i

X

T

MAY Fig i0 Structural transformation of

an embedded sentence

transformation are often required due to the differ-

process goes down from the root node to all the

leaf nodes

After this process of phrase structure genera-

tion, some sentential transformations are performed

such as follows

( i ) When an agent is absent, passive transforma-

tion is applied

( ii ) When the agent and object are both m i s s i n g ,

the predicative verb is nominalized and

placed as the subject, and such verb phrases

as "is made", and "is performed" are supple-

mented

(iii) When a subject phrase is a big tree, the

anticipatory subject "it" is introduced

( iv ) Pronominalization of the same subject nouns

is done in compound and complex sentences

( v ) Duplication of a head noun in the conjunctive

noun phrase is eliminated, such as, "uniform

"uniform and non-uniform components"

( v i ) Others

Another big structural transformation required

comes from the essential difference between DO-

English the case slots such as tools, cause/reason,

and some others come to the subject position very

often, while in Japanese such expressions are never

rated in the generation grannnar such as shown in

This stylistic transformation part is still very

tic knowledge and lexical data to have more satis-

fiable English expressions

destroyed the buildings

Fig ii An example of structural transformation

in the generation phase

This paper described a number of strategies

we employed in the transfer and generation stages

of our M u system to make the system both powerful

system has many advantages such as the flexibility

of the generation process, the utilization of

course of development in collaboration with a num- ber of computer scientists from computer industries

results are attached in the last, which show the

sive improvement is expected in the next two years

ACKNOWLEDGEMENTS

We acknowledge the members of the Mu-Project, especially, Mr S Takai(JCS), Mr Y Fukumochi (Sharp Co.), Mr T Ishioka(JCS), Miss M Kume (JCS), Mr H Sakamoto(Oki Co.), Mr A Kosaka (NEC Co.), Mr H Adachi(Toshiba Co.), Miss A

who contributed greatly for the implementation of the system

REFERENCES

Japanese Government, a paper presented at the workshop between EUROTRA and Japanese machine translation experts, held in Brussels on November 24-25, 1983

(GRADE) of Mu-Machine Translation Project and its Charactersitics, Proc of COLING 84, 1984

Japanese in the Mu-Project A Procedural Approach to Analysis Grammar , ibid

Japanese Syntactic Analysis in Mu-Project-

JE, ibid

Japanese Translation System, Proc of COLING

82, 1982

Sample outputs as of April, 1984 are attached in the next page

Trang 8

N g ~

':'~ i/i

o

"="+ i

o

+.-~

Q

E

0

T

• Q

~.~

e : ~

" o

o)

oJ

c

v ,

m

N

X

o

e~

" o

¢

x v,l

o

¢

m

0

- - - ¢1

o o o

o * , >

u J , ~

0

U

1 )

~ o

0 0

[] ~;.~ ~ ~ :

°.~ ~3 o o

° ~ - -o~ "- " [

Ngày đăng: 31/03/2014, 17:20

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm