IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT Makoto Nagao, Toyoaki Nishida and Jun-ichi Tsujii Department of Electrical Engineering Kyoto Unive
Trang 1IN LANGUAGE TRANSLATION TRANSFER AND GENERATION STAGE OF MU MACHINE TRANSLATION PROJECT Makoto Nagao, Toyoaki Nishida and Jun-ichi Tsujii Department of Electrical Engineering
Kyoto University Sakyo-ku, Kyoto 606, JAPAN
I INTRODUCTION Linguistic knowledge usable for machine trans-
lation is always imperfect We cannot be free from
the uncertainty of knowledge we have for machine
translation Especially at the transfer stage of
machine translation, the selection of target lan-
guage expression is rather subjective and optional
Therefore the linguistic contents of machine
translation system always fluctuate, and make
gradual progress The system should be designed to
allow such constant change and improvements This
paper explains the details of the transfer and gen-
eration stages of Japanese-to-English system of the
machine translation project by the Japanese Govern-
ment, with the emphasis on the ideas to deal with
the incompleteness of linguistic knowledge for
machine translation
2 DESIGN STRATEGIES
2.1 Annotated Dependency Structure
The intermediate representation we adopted as
the result of analysis in our machine translation
is the annotated dependency structure Each node
has arbitrary number of features as shown in Fig i
This makes it possible to access the constituents
by more than one linguistic cues This representa-
tion is therefore powerful and flexible for the
sophisticated grammatical and semantic checking,
especially when the completeness of semantic analy-
sis is not assured and trial-and-error improvements
are required at the transfer and generation stages
2.2 Multiple L~aver Grammar
We have three conceptual levels for grammar
rules
lowest level: default grammar which guarantees the
output of the translation process The quality
of the translation is not assured Rules of
this level apply to those inputs for which no
higher layer grammar rules are applicable
kernel level: main grammar which chooses and gener-
ates target language structure according to
semantic relations among constituents which are
determined in the analysis stage
topmost level: heuristic grammar which attempts to
get elegant translation for the input Each
rule bears heuristic nature in the sense that it
is word specific and it is applicable only to
some restricted classes of inputs
2.3 Multiple R e l a t i o n Structure
In principle, we use deep case dependency structure as a semantic representation Theoreti- cally we can assign a unique case dependency struc- ture to each input sentence In practice, however, analysis phase may fail or may assign a wrong structure Therefore we use as an intermediate representation a structure which makes it possible
to annotate multiple possibilities as well as mul- tiple level representation An example is shown in Fig 2 Properties at a node is represented as a vector, so that this complex dependency structure
is flexible in the sense that different interpreta- tion rules can be applied to the structure
2.4 Lexicon Driven Feature Besides the transfer and generation rules which involve semantic checking functions, the grammar allows the reference to a lexical item in the dictionary A lexical item contains its spe- cial grammatical usages and idiomatic expressions During the transfer and generation stages, the~e rules are activated with the highest priority This feature makes the system very flexible for dealing with exceptional cases The improvement o f translation quality can be achieved progressively
by adding linguistic information and word usages in the dictionary entries
2.5 Format-Oriented Description o f Dictionary Entries
The quality of a machine translation system heavily depends on the quality of the dictionary
In order to build a machine translation dictionary,
we collaborate with expert translators We develop-
ed a format-oriented language to allow computer- naive human translators to encode their expertise without any conscious effort on programming
Although the format-oriented language we developed lacks full expressive power for highly sophisticat-
ed linguistic phenomena, it can cover most of the common lexical information translators may want to describe The formatted description is automati- cally converted into statements in GRADE, a pro- gramming language developed by the Mu-Project We prepared a manual according to which a man can fill
in the dictionary format with linguistic data of items The manual guarantees a certain level of quality of the dictionary, which is important when many people have to work in parallel
Trang 2J - C A T = V e r b
J - L E X ffi I t ~ "f ~ ( i n c r e a s e )
J - D E E P - C A S E = M A I N
J - G A P f f i ' ( S O U r c e G O A l ) '
J - S E I ~ W E N C E - C O N N E C T O R = D E C L A R A T I V E
J - S E N T E N C E - R E L A T I O N = N I L
J S E h ~ I ' E N C E - E N D © N I L
J - D E E P T E N S E = P R E S E N T
J - D E E P A S P E C T = B e y o n d T i m e
J D E E P M O D E = N I L
J.VERB.ASPECT ffi TRANSITIVE J-VERB.INT = N O
J-VERB-PAT='(~: ~." ~' :~ I "C" :: )' J-VERB-SD ~'(~ ~ -SUBject T-CAUse )' J-NEG = N I L
J - C A T = No'tin
J - L E X = ;~ ~ ( a d v a n c e )
J D E E P - C A S E ffi C A U s e
J S U T ' F A C E - C A S E ffi ~'-
I
I -CAT= N o u n
(electronic i n s t r u m e n t s t i o n ) D E E P C A S E ffi SUBject
: S U R F A C E - C A S E = -'9
J - C A T = N o u n
J - £ Z X ffi N I L
J - D E E I ' - C A S E = S O U r c e
J - S U R F A C E - C A S E = ~, ,%
J - C A T f N o u n
J L E X ,= g~ I b ~ ( [ ' ~ ( a n t o m a t a d s h i p )
J - D E E P - C A S E ffi S U B j e c t
J-SURFACE-CASE = ~'
J - B K K - L E X = ~:
J - N F f f i N I L
J - D E E P - B F K I - 3 = N I L
J - S U R F A C E - B F K I - 3 ,= N I L
J - B F K - L E X 1 - 3 , , N I L J-N,ffi C.,ommonNotm
J S E M , , O M ( u r t h S e i n l o b j e c t )
J - N U M B E R = N I L o
I
J-CAT = Noun
I
J-LEX = NIL
J - S U R F A C E - C A S E ='(( " T" ) (:=))"
d u m m y nodes
Fig i R e p r e s e n t a t i o n of a n a l y s i s r e s u l t by features
work w o r k
I I [ J - L E X = he
agent O R possess ~ |J-DEEP-CASE l
I - - I L = agent O R posessj
Fig 2 An e x a m p l e of c o m p l e x d e p e n d e n c y s t r u c t u r e
3 O R G A N I Z A T I O N OF G R A M M A R R U L E S F O R T R A N S F E R
A N D G E N E R A T I O N S T A G E S 3.1 H e u r i s t i c R u l e F i r s t
G r a m m a r r u l e s a r e o r g a n i z e d a l o n g the p r i n c i -
p l e that "if b e t t e r r u l e e x i s t s then the s y s t e m uses it; o t h e r w i s e the s y s t e m a t t e m p t s to use a
s t a n d a r d rule: if it fails, the s y s t e m w i l l use a
d e f a u l t r u l e " T h e g r a m m a r rule i n v o l v e s a n u m b e r
of s t a g e s for a p p l y i n g h e u r i s t i c rules Fig 3 shows a p r o c e s s i n g f l o w for the t r a n s f e r and g e n e r -
a t i o n stages
H e u r i s t i c rules are w o r d s p e c i f i c G R A D E m a k e s
it p o s s i b l e to d e f i n e w o r d s p e c i f i c rules Such rules can be i n v o k e d in m a n y ways F o r e x a m p l e , w e can a s s o c i a t e a w o r d s e l e c t i o n rule for an o r d i n a r y
v e r b in a d i c t i o n a r y e n t r y for a noun, as s h o w n in Fig 4
Trang 3terna•l P re-transfer ~ post-transfer
phrase
s t r u c t u r e , s /
transformation
MORPHOLOGICAL SYNTHESIS
(a) Activating a Lexical Rule for a Noun " ~ J ~ ' ( e f f e c t ) from a Governing Verb "+~ + "(give)
J-LEX= ~- ~ ~ (five) TRANSFER
J-N- KOUSETSU = ~ ~-KOUSETSU-TRANSFE R I : : 2
/
*'~ / " < V E R B > : A ~ , ~ £ ~
I
(b) Form-Oriented Description of a Transfer Rule f o r a Noun "~J~m~'(effect)
~- EFFECT +-~>~
[ f t l & + t | I ' [ I++.~
+'+ + + ' ~ , i
s I I t 6
t i = ~
I~FF~CT)TE
I F t P p t C T I T E
I
I
I IPE ¢
!~! ~ u a 3 0 8 ;
a~T o o J
I
" t O O ~ 0
• = ~ ^ c • I ~U=G ~ l AnG +
/ z e (
I
-!
|!
Some heuristic rules are activated just after the standard analysis of a Japanese sentence is finish-
ed, to obtain a more neutral (or target language o r i e n t e ~
such invocation the pre-
pragmatic interpretation are done in the pre-transfer
rules are applied in this loop, the better result will
show some examples
Target Language by Using Semantic Markers Word selection in the target language is a big problem in machine transla-
of choices of translation for a word in the source
adopted in our system are, (i) Area restriction by using field code, such
as electrical Engineer- ing, nuclear science, medicine, and so on (2) Semantic code attached
to a word in the analy- sis phase is used for the selection o f a p r o p e r target language word or
a phrase
(3) Sentential structure of the vicinity of a word
to be translated is sometimes effective for the determination of a
in the target language Table i shows examples
of a part of the verb trans-
of English verb is done by the semantic categories of nouns related to the verb The number i attached to verbs like form-l, produce-
2 is the i-th usage of the
information of nouns is not available, the column indi- cated by ~ is applied to
invocation of grammar rules
Trang 4J.CAT=Verb
J-LEX = ~" ~, Ido not h a v e )
J-LEX = ~'~(sense)
{
J-CAT= Noun
I
"~ J-CAT = Al)Jamtive
{ J - L E X = ~ ~ " ~ ~ Cmeaning{ess)
= e x p r e ~ i o n which d ~ s not h a v e s e r e " ~ " m e a n i n g l e ~ e x p r e ~ i o n "
Fig 5 An example of a heuristic rule used in the
pre-transfer loop
~>-~:few
!
(to be determined
X
- ~ ( + t e n d tO)
/ A
In most cases, we can use a fixed format for describing a translation rule
ber of dictionary formats specially designed for the ease of dictionary in- put by computer-naive expert translators The expressive power of format- oriented description is, however, insuf- ficient for a number of common verbs such as " ~ ~ " (make, do, perform ) and " ~ ~ " (become, consist of, provide,
of usages are to be listed up with their corresponding English sentential struc- tures and semantic conditions
The transfer stage bridges the gap between Japanese and English expressions There are still many odd structures after this stage, and we have to adjust further more the English internal repre-
call this part as post-transfer loop
An example is given in Fig 8, where a Japanese factitive verb is first trans- ferred to English "make", and then a structural change is made to eliminate
it, and to have a more direct expression
Postpositions Postpositions in Japanese general-
postposition, however, has different usages, and the determination of English prepositions for each postposition is
verb which governs the noun phrase hav- ing that postposition
Table 2 illustrates a part of a default table for determining deep and surface case labels when no higher level
this way, we confirm at least one trans-
particular usage of preposition for a particular English verb is written in the lexical entry of the verb
Structures in Target Language
Fig 6 Examples of pre-transfer rules
Trang 5structure
action,deed,movement occur-i reaction
form X(obj)
X take place
X occur standard,property
produce X form-I
non-living substance
produce-2 improve-i
x produce Y property
Semantic marker for X/Y
X improve Y
X increase Y
Table i Word selection in target language by using semantic markers
~ (NARO)
(1) A ~rS ~'~ ~
: : _ _ _ >
~r
/ \ , / \
/ \
(suB) (GOAL)
provide
reach
turn
become
(3) dictionary rules
of popular verbs
Grobal sentential structures of Japanese and English are quite different, and correspondingly the internal structure of a Japanese sentence is
difference from Japanese internal representation
to that of English is absorbed at the (pre-, post
generation, some structural transformations are still required in such oases as (a) embedded sentential structure, (b) complex sentential structure
We classified four kinds of embedded senten- tial structures
(i) a case slot of an embedded sentence is vacant, and the noun modified by the embedded sentence comes to fill the slot
the semantic properties like parts, attributes, and action
(~i~)The third and the fourth classes are particular embedded expressions in Japanese, which have
" g ~ , P " (in that), and so on
An example of the structural transformation
is generated after the structural transformation Connection of two sentences in the compound and complex sentences is done according to Table
After the transfer is done from the Japanese deep dependency structure to the English one, conversion is done to a phrase structure tree with
processes explained in 4.1 and 4.2 are involved at
ed top-down from the root node of the dependency
demands a noun phrase expression or a to-infinitive expression to its dependent phrase, the structural
verb transformation, and noun to adjective
Trang 6~ ~ ~ m a k e
I ( C : i n t r a n s i t i v e ( c o n s u l t a t i o n to verb) l e x i e a l i t e m C)
> C'
!
( C ' : t r a n s i t i v e v e r b d e r i v e d
d e r i v e d from C)
A ~ I " 8 t I ~ -~A m a k e B r o t a t e > A r o t a t e B
Fig 8 An e x a m p l e of p o s t - t r a n s f e r r u l e a p p l i c a t i o n
T a b l e 2 D e f a u l t r u l e for a s s i g n i n g a c a s e l a b e l of E n g l i s h to a
J a p a n e s e p o s t p o s i t i o n " l~ " (ni)
J A P A N E S E E N G L I S H
S E N T E N T I A L S E N T E N T I A L
C O N N E C T I V E D E E P - C A S E C O N N E C T I V E
R E N Y O
( - S H I ) T E
R E N Y O
( - S H I ) T E
- T A M E
- N O D E
- K A R A
- T O
- T O K I
- T E
- T A M E
- N O N I
- Y O U
- Y O U
- K O T O N A K U
- N A C A P ~ ,
- B A
T O O L
T O O L
C A U S E
T I M E
P U R P O S E
II
MANNER
A C C O M P A N Y
C I R C U M S T A N C E
B Y - I N G
B Y - I N G
B E C A U S E
¢!
s!
W H E N
S O - T H A T - M A Y
to
A S - I F
W I T H O U T - I N G
W H I L E - I N G
W H E N
, ° ,
T a b l e 3 C o r r e s p o n d e n c e of s e n t e n t i a l c o n n e c t i v e s
h e school r e s i g n r e a s o n
N 1 N 2 V N 3 [ANALYSIS] r e a s o n ( N 3 ) , \
resign(V) /
-~ / m I " ~ ' ~ O e
i l s c h o o l ( N 2 ) r e a s o n
h e [TRANSFER]
73 , P R O P C A U S E
i
N 1 N 2 (N 3) [GENERATION] N P
N 3 R E L C L REL/~V S
,
why
Fig 9 S t r u c t u r a l t r a n s f o r m a t i o n of an e m b e d d e d
s e n t e n c e of t y p e 3
Trang 7,ANALYSIS] ~i [i
X
T
MAY Fig i0 Structural transformation of
an embedded sentence
transformation are often required due to the differ-
process goes down from the root node to all the
leaf nodes
After this process of phrase structure genera-
tion, some sentential transformations are performed
such as follows
( i ) When an agent is absent, passive transforma-
tion is applied
( ii ) When the agent and object are both m i s s i n g ,
the predicative verb is nominalized and
placed as the subject, and such verb phrases
as "is made", and "is performed" are supple-
mented
(iii) When a subject phrase is a big tree, the
anticipatory subject "it" is introduced
( iv ) Pronominalization of the same subject nouns
is done in compound and complex sentences
( v ) Duplication of a head noun in the conjunctive
noun phrase is eliminated, such as, "uniform
"uniform and non-uniform components"
( v i ) Others
Another big structural transformation required
comes from the essential difference between DO-
English the case slots such as tools, cause/reason,
and some others come to the subject position very
often, while in Japanese such expressions are never
rated in the generation grannnar such as shown in
This stylistic transformation part is still very
tic knowledge and lexical data to have more satis-
fiable English expressions
destroyed the buildings
Fig ii An example of structural transformation
in the generation phase
This paper described a number of strategies
we employed in the transfer and generation stages
of our M u system to make the system both powerful
system has many advantages such as the flexibility
of the generation process, the utilization of
course of development in collaboration with a num- ber of computer scientists from computer industries
results are attached in the last, which show the
sive improvement is expected in the next two years
ACKNOWLEDGEMENTS
We acknowledge the members of the Mu-Project, especially, Mr S Takai(JCS), Mr Y Fukumochi (Sharp Co.), Mr T Ishioka(JCS), Miss M Kume (JCS), Mr H Sakamoto(Oki Co.), Mr A Kosaka (NEC Co.), Mr H Adachi(Toshiba Co.), Miss A
who contributed greatly for the implementation of the system
REFERENCES
Japanese Government, a paper presented at the workshop between EUROTRA and Japanese machine translation experts, held in Brussels on November 24-25, 1983
(GRADE) of Mu-Machine Translation Project and its Charactersitics, Proc of COLING 84, 1984
Japanese in the Mu-Project A Procedural Approach to Analysis Grammar , ibid
Japanese Syntactic Analysis in Mu-Project-
JE, ibid
Japanese Translation System, Proc of COLING
82, 1982
Sample outputs as of April, 1984 are attached in the next page
Trang 8N g ~
':'~ i/i
o
"="+ i
o
+.-~
Q
E
0
T
• Q
~.~
e : ~
" o
o)
oJ
c
v ,
m
N
X
o
e~
" o
¢
x v,l
o
¢
m
0
- - - ¢1
o o o
o * , >
u J , ~
0
U
1 )
~ o
0 0
[] ~;.~ ~ ~ :
°.~ ~3 o o
° ~ - -o~ "- " [