By 1965 the Mark II Russian- English system [5] had been installed at the Foreign Technology Division of the U.S... This provides the grammar.
Trang 1An E x p e r i m e n t in Machine T r a n s l a t i o n
INTRODUCTION
Although funding for Machine Translation (MT) research
virtua11y ended in the U.S with the release of the
ALPAC report [1] in 1966, there has been a continuing
interest in this f i e l d Rapid evolution of science and
technology, coupled with increased world-wlde exposure
of t h e i r products, demands more and more speed in trans-
l a t i o n (e.g., in the case of operation and maintenance
manuals) Unfortunately, this rapid evolution has made
t r a n s l a t i o n an even more d i f f i c u l t and time-consuming
task The large surplus of (presumably q u a l i f i e d )
translators cited by the ALPAC report simply does not
e x i s t in many t e c h n i c a l a r e a s ; t h e c u r r e n t s t a t e o f
a f f a i r s Finds i n s t e a d a c r i t i c a l s h o r t a g e In a d d i t i o n ,
t h e p r o p o r t i o n o f s c i e n t i f i c and t e c h n i c a l l i t e r a t u r e •
p u b l i s h e d in E n g l i s h is d i m i n i s h i n g As q u a l i f i e d human
t r a n s l a t o r s become more s c a r c e and c o s t s o f human t r a n s -
l a t i o n r i s e w h i l e c o s t s o f purchase and o p e r a t i o n o f
p o w e r f u l computer systems f a l l , t h e r e must come a time
when, i f MT is f e a s i b l e a t a l l , i t w i l l be c o s t - e f f e c -
t i v e I t is a p p r o p r i a t e , t h e n , t o i n v e s t i g a t e the
s t a t e - o f - t h e - a r t in MT with respect to two central ques-
tions: is high-quality MT Feaslble (and in what sense);
and i f feasible, is i t cost-effectlve?
Thls paper r e p o r t s the r e s u l t s o f an e x p e r i m e n t in
h l g h l y a u t o m a t i c , h i g h - q u a l i t y machine t r a n s l a t i o n The
LRC's MT system, METAL ( f o r M e c h a n i c a l T r a n s l a t i o n and
A n a l y s i s o f L a n g u a g e s ) , is an a d v a n c e d , ' t h i r d g e n e r a -
t i o n ' system i n c o r p o r a t i n g p r o v e n N a t u r a l Language P r o -
c e s s i n g (NLP) t e c h n i q u e s , b o t h s y n t a c t i c and s e m a n t i c ,
and stands a t the f o r e f r o n t o f the MT r e s e a r c h F r o n t i e r
In the e x p e r i m e n t , METAL was employed in t h e t r a n s l a t i o n
o f a 50-page t a x t From German i n t o E n g i l s h in o r d e r t o
d e t e r m i n e w h e t h e r t h e system as i t e x i s t s can be e f f e c -
t i v e l y a p p l i e d t o c u r r e n t t r a n s i a t l o n needs, e f f e c t i v e -
ness t o be d e t e r m i n e d by some o b j e c t i v e measure o f t h e
q u a l i t y and c o s t o f machine ( i e , METAL) v s human
t r a n s l a t i o n
EARLIER MT EFFORTS
Since B r u d e r e r [ 2 ] has r e c e n t l y p u b l i s h e d a c o m p l e t e
s u r v e y o f MT p r o j e c t s , and H u t c h i n s [ 3 ] r e v i e w s the
most important developments through 1977, we w i l l men-
tion only a few of the major e f f o r t s The f i r s t popular
demonstration of the p o s s i b i l i t i e s in MT was provided by
IBM and the Georgetown University group in 19S4 [ 4 ]
With a vocabulary of about 250 words and a grammar com-
prising some six rules in what was called an "operation-
al syntax", the system demonstrated some rudimentary
c a p a b i l i t y in Russian to English t r a n s l a t i o n This in-
stlgated a massive government funding e f f o r t over the
next decade, and some 20 m i l l i o n dollars was invested in
17 d i f f e r e n t projects By 1965 the Mark II Russian-
English system [5] had been installed at the Foreign
Technology Division of the U.S A i r Force at Wright-
Patterson AFB, and the Georgetown system had been d e l i -
vered to the Atomic Energy Commission at Oak Ridge Na-
tlonal Laboratory and to EURATOM in Ispra, I t a l y Re-
viewing MT systems such as these at the request of the
National Science Foundation, the Automatic Language Pro-
cessing Advisory Committee (ALPAC) reported in 1966 that
MT was slower, less accurate, and more expensive than
human t r a n s l a t i o n ; f u r t h e r , that there was no predlcta-
ble prospect of improvement in MT c a p a b i l i t y Though
strongly and perhaps j u s t i f i a b l y c r i t i c i z e d [ 6 ] , this
report soon resulted in the v i r t u a l elimination of MT
funding in the U.S., and a sizeable reduction in fo~ign
e f f o r t s as w e l l
J o n a t h a n Slocum
I i n g u i s t i c s Research C e n t e r The U n i v e r s i t y o f Texas
P e t e r Toma, who was r e s p o n s i b l e f o r t h e i n s t a l l a t i o n s a t Oak Ridge and I s p r a c i t e d a b o v e , soon began p r i v a t e e f -
f o r t s a t i m p r o v i n g the Georgetown s y s t e m T h i s c u l m i n a - ted in SYSTRAN [ 7 ] , w h i c h r e p l a c e d Mark I I a t WPAFB in
1970 and t h e Georgetown system a t EURATOM in 1976 SYSTRAN was a l s o used by NASA d u r i n g the A p o l l o - S o y u z
m i s s i o n In 1976 t h e Commission o f European Communities
a d o p t e d SYSTRAN f o r E n g l i s h t o French t r a n s l a t i o n ; how-
e v e r , an e v a l u a t i o n o f i t s t r a n s l a t i o n s by t h e EEC p o s t -
e d i t o r s in B r u s s e l s found t h e r e s u l t s t o be f a r from s a t -
i s f a c t o r y : " a l l the r e v i s o r s had e x h a u s t e d t h e i r p a t i e n c e
b e f o r e t h e end" [ 8 ] D e s p i t e i t s g e n e r a l l y low t r a n s l a -
t i o n q u a l i t y , SYSTRAN is t h e most w i d e l y used MT system
to d a t e i t s c h i e f commercial c o m p e t i t o r , LOGOS [ 9 ] , is
a n o t h e r example o f a " d i r e c t " MT system As in SYSTRAN,
t h e a n a l y s i s and s y n t h e s i s components a r e s e p a r a t e d b u t the l i n g u i s t i c p r o c e d u r e s a r e d e s i g n e d f o r a s p e c i f i c
s o u r c e - l a n g u a g e (SL) and t a r g e t - l a n g u a g e (TL) p a i r In
an e v a l u a t i o n by S l n a i k o and K l a r e [ 1 0 ] , LOGOS d l d n o t
f a r e w e l l 8 r u d e r e r [ 2 ] r e p o r t s f u r t h e r d e v e l o p m e n t f o r
t r a n s l a t i o n i n t o R u s s i a n , and e x p e r i m e n t s on F r e n c h , Ger- man and S p a n i s h , b u t p r o v i d e s few d e t a i l s
In an e f f o r t t o c o r r e c t t h e o b v i o u s i n a d e q u a c i e s o f these and o t h e r ' f i r s t g e n e r a t i o n ' s y s t e m s , which e s s e n -
t i a l i y t r a n s l a t e w o r d - f o r - w o r d w i t h no a t t e m p t a t a u n i -
f i e d a n a l y s i s a t the s e n t e n c e l e v e l , and w h i c h were d e -
v e l o p e d ab i n i t i o f o r a s p e c i f i c SL-TL p a i r , r e s e a r c h e r s began t o i n v e s t i g a t e methods o f a n a l y z i n g s e n t e n c e s i n t o
s t r u c t u r e s from w h i c h in t h e o r y any TL c o u l d be g e n e r a -
t e d There a r e two broad t y p e s o f such 'second g e n e r a -
t i o n ' systems One t y p e produces a n a l y s e s in a " n e u t r a l "
s t r u c t u r e , o r ' i n t e r l i n g u a ~ ; the o t h e r produces SL s y n -
t a c t i c s t r u c t u r e s which a r e t r a n s f o r m e d v i a a p r o c e s s
c a l l e d ' t r a n s f e r ' i n t o a s y n t a c t i c s t r u c t u r e f o r t h e TL
s e n t e n c e One example o f t h e f o r m e r approach is the system produced by the C e n t r e d ' ~ t u d e s pour la T r a d u c -
t l o n A u t o m a t i q u e (CETA) a t the U n i v e r s i t y o f G r e n o b l e [ 1 1 ] D u r i n g the p e r i o d from 1961 t o 1971 t h i s g r o u p
d e v e l o p e d a Russian t o French MT system An e v a l u a t i o n
a t t h e end o f t h a t p e r i o d r e v e a l e d t h a t o n l y 42~ o f t h e
s e n t e n c e s were b e i n g c o r r e c t l y t r a n s l a t e d Some f a i l - ures were due t o e r r o r s in t h e i n p u t , b u t t h e m a j o r i t y were due t o programming e r r o r s , f a i l u r e t o p r o d u c e a
l e x i c a l a n a l y s i s o f a word o r a s y n t a c t i c a n a l y s i s o f a sentence, i n e f f i c i e n c i e s in the parser causing i t to ap- ply too many rules, etc The Traduction Automatique de
l ' U n i v e r s i t ~ de MontrEal (TAUM) project [12] is an exam- ple of the transfer approach There are f l v e grammars called "q-systems" to e f f e c t morphological and syntactic analysis of English, then t r a n s f e r , then syntactic and morphological synthesis of French Each such stage con- sists of a series of generalized t r e e - s t r u c t u r e transfoP mations The significance of TAUM is that, of the sec- ond-generation systems, i t is the nearest to operational implementation: i t is to be applied to the t r a n s l a t i o n
of a i r c r a f t maintenance manuals
in 1978 the European project EUROTRA was i n i t i a t e d , ap- parently adopting the newer Grenoble system ARIANE, in order to produce an advanced, second generation MT sys- tem for the eventual replacement of the f i r s t genera- tion system (SYSTRAN) currently in use [ 8 ] The Greno- ble group, now t i t ] e d Groupe d'Etudes pour la Traduc- tion Automatlque (GETA), abando'ed t h e i r e a r l i e r ap- proach in l i g h t of i t s deficiencies and produced a sys- tem to translate in s i x passes: morphological analysis,
m u l t i - l e v e l (syntactic and semantic) analysis, lexical
t r a n s f e r , structural t r a n s f e r , syntactic generation, and morphological generation M u l t i - l e v e l analysis, struc- tural t r a n s f e r , and syntactic generation are a l l e f f e c - ted ~.a a general t r e e - t o - t r e e transducer program, some-
163
Trang 2what less powerfu; but merhaps more e f f i c i e n t than the Q-
systems transduce r in TAUM; the other components have Spe-
cial programs suited to t h e i r function The emphasis in
this project is apparently twofold: increased e f f i c i e n c y
and r e l i a b i l i t y through adoption of components with the
minimum necessary power, and decreased s e n s i t i v i t y to
f a i ) u r e in individual stages through the expedient of in-
suring that every component has some output, even i f
such o u t p u t is n o t h i n g more than the o r i g i n a l i n p u t I f
we have i n t e r p r e t e d the VauQuois mimeo [8] p r o p e r l y , t h i s
must be ~ e l a r g e s t and most comprehensive MT p r o j e c t y e t
undertaken
DESCRIPTION OF METAL
There are two d i f f e r e n t c l a s s i f i c a t i o n s of "generations"
in MT systems The f i r s t posits three generations (cur-
rently) according to the following c r i t e r i a : (I) trans-
lation is word-for-word, with no s i g n i f i c a n t syntactic
analysis; (2) translation proceeds a f t e r obtaining a
complete syntactic analysis of an input, with no s i g n i f i -
cant semantic analysis; (3) translation proceeds a f t e r
obtaining a complete semantic analysis of an input The
d e f i n i t i o n of ' t h i r d generation' says nothing about ex-
t r a - s e n t e n t i a l information, and one might posit a
' f o u r t h generation' which employs such information The
other c l a s s i f i c a t i o n proceeds according to the following
c r i t e r i a : (l) translation proceeds " d i r e c t l y " from the
SL to the TL, and the SL is analyzed only to the minimum
extent necessary to generate TL equivalents; (2) trans-
lation proceeds " i n d i r e c t l y " by deriving a more-or-less
standard analysis of the input, independent of the TL in-
volved (but not necessarily of the SL), and then genera-
ting TL output based on the standard analysis Within
this d e f i n i t i o n of 'second generation', as noted above,
there are the ' t r a n s f e r ' vs ' i n t e r l i n g u a ' approaches
We prefer to characterize METAL as a ' t h i r d generation'
system a c c o r d i n g to the f i r s t c l a s s i f i c a t i o n g i v e n above
because t h i s makes i t c l e a r t h a t METAL d e r i v e s a sub-
s t a n t i a l semantic a n a l y s i s , whereas the second d e f i n i t i o n
o f 'second g e n e r a t i o n ' does not n e c e s s a r i l y imply t h a t
semantic a n a l y s i s o f any kind is p e r f o r m e d
METAL comprises two d i s t i n c t components: the l i n g u i s t i c
and the c o m p u t a t i o n a l The l i n g u i s t i c component con-
s i s t s o f l e x i c o n s , p h r a s e - s t r u c t u r e grammar r u l e s , case
frames and t r a n s f o r m a t i o n s SL and TL l e x i c a l e n t r i e s
i n c l u d e f e a t u r e - v a l u e p a i r s encoding s y n t a c t i c and sem-
a n t i c i n f o r m a t i o n such as grammatical c a t e g o r y , i n f l e c -
t i o n a l c l a s s , semantic t y p e , and case i n f o r m a t i o n (see
F i g u r e ] ) T r a n s f e r l e x i c a l e n t r i e s i n d i c a t e how and
under what c o n d i t i o n s words o r idioms in one language
t r a n s l a t e i n t o words o r idioms in a n o t h e r (see F i g u r e
2 ) The p h r a s e - s t r u c t u r e r u l e s may be augmented w i t h
procedures t o d e t e r m i n e t h e i r a p p l i c a t i o n v i a f e a t u r e /
v a l u e t e s t s , t o add o r copy f e a t u r e s and v a l u e s in the
i n t e r p r e t a t i o n being c o n s t r u c t e d , to invoke c a s e - f r a m e
r o u t i n e s , and t o invoke s p e c i f i c o r g e n e r a l t r a n s f o r m a -
t i o n s Case-frame r o u t i n e s determine semantic case r e -
l a t i o n s h i p s between verbs and nouns on the b a s i s o f syn-
t a c t i c and semantic f e a t u r e s , and produce t h e i r o u t p u t
in the form o f p r o p o s i t i o n a l t r e e s T r a n s f o r m a t i o ' - are
p a t t e r n - p a i r s t h a t s p e c i f y o l d and new t r e e s t r u c t u r e s ;
when invoked, a t r a n s f o r m a t i o n a t t e m p t s to match i t s
" o l d " s i d e a g a i n s t the c u r r e n t s t r u c t u r a l d e s c r i p t o r ,
and i f successful converts i t into one matching its
"new" side In the process, features and values may be
tested and set a r b i t r a r i } y This provides the grammar
with v i r t u a l l y unlimite~ -ontext s e n s i t i v i t y , but since
no i n t e r p r e t a t i o n can a f f e c t the operation of the parser
i t s t i l l enjoys the advantages of context-free opera-
tion F i n a l l y , there is a method for scoring, or rating,
i n t e r p r e t a t i o n s ; this allows the system to determine the
"best" i n t e r p r e t a t i o n for t r a n s l a t i o n , and also provides
another mechanism for rejecting the application of any
rule, v i z , a score below c u t o f f Figure 3 i l l u s t r a t e s a
typical grammar rule
ALO ( ! n ) ( i )
GC (A D~ (0)
PLC (WI) (WI NF) %
RO (TMP TOP LOC DST TAR EQU))
ALO ( i n )
RO (DST LOC)
PO (PRE)
ON (VO))
ALO ( i n t o )
RO (OST LOC)
PO (PRE)
ON (VO))
Figure 1 German Preposition " i n " and Two Corresponding English Prepositions CAT - grammatical category
PREP - p r e p o s i t i o n ALO - a l l omorph ' i n ' - the s t r i n g " i n "
' i ' (as in the s t r i n g "im")
GC - grammatical case
A - accusative
D - dative
CN - contracted [with]
S - (as in " i n s " )
M - (as in "im") PLC - placement
WI - w o r d - i n i t i a l
WF - w o r d - f i n a l
RO - semantic r o l e TMP - temporal TOP - t o p i c LOC - l o c a t i v e DST - d e s t i n a t i o n TAR - t a r g e t EQU - e q u a t i v e
PO - p o s i t i o n PRE - pre-posed
ON - o n s e t Sound
VO - v o c a l i c
Figure 2 Transfer Entries for the German Preposition " i n "
The German PREPosition " i n " (in parentheses) may trans- late into the English PREPosition " i n t o " i f the Gramma-
t i c a l Case of the German PP is 'Accusative'; i t may tran- slate into the English PREPosition " i n " i f the Grammati- cal Case of the German PP is ' D a t i v e ' A r b i t r a r y numbers and types of conditions may be specified in transfer entries
The computational component, w r i t t e n in LISP, consists
of the parser, the case-frame routines, the transforma- tion pattern-marcher, the transfer program, the genera-
t o r , and other procedures needed to drive and support the t r a n s l a t i o n p r o c e s s The p a r s e r is a h i g h l y e f f i -
c i e n t i m p l e m e n t a t i o n o f the Cocke-Kasami-Younger a l g o -
164