A speaker's utterance of an ellipti- cal expression, like the figure "'twelve fifteen", might have a different meaning depending on the context of situation, the way the conversation has
Trang 1Improving Translation through Contextual Information
M a i t e T a b o a d a "
C a r n e g i e M e l l o n U n i v e r s i t y
5000 F o r b e s A v e n u e
P i t t s b u r g h , P A 15213
t a b o a d a + © c m u , edu
A b s t r a c t This paper proposes a two-layered model
of dialogue structure for task-oriented di-
alogues that processes contextual informa-
tion and disambiguates speech acts The
final goal is to improve translation quality
in a speech-to-speech translation system
1 A m b i g u i t y i n S p e e c h T r a n s l a t i o n
For any given utterance out of what we can loosely
call context, there is usually more than one possible
interpretation A speaker's utterance of an ellipti-
cal expression, like the figure "'twelve fifteen", might
have a different meaning depending on the context of
situation, the way the conversation has evolved un-
til that point, and the previous speaker's utterance
"Twelve fifteen" could be the time "a quarter after
twelve", the price "one thousand two hundred and
fifteen", the room number "'one two one five", and so
on Although English can conflate all those possible
meanings into one expression, the translation into
other languages usually requires more specificity
If this is a problem for any human listener, the
problem grows considerably when it is a parser do-
ing the disambiguation In this paper, I explain how
we can use discourse knowledge in order to help a
parser disambiguate among different possible parses
for an input sentence, with the final goal of improv-
ing the translation in an end-to-end speech transla-
tion system
The work described was conducted within the
JANUS multi-lingual speech-to-speech translation
system designed t o translate spontaneous dialogue
in a limited domain (Lavie et al 1996) The
machine translation component of JANUS handles
these problems using two different approaches: the
Generalized Left-to-Right parser GLR* (Lavie and
Tomita, 1993) and Phoenix the latter being the fo-
cus of this paper
*The author gratefully acknowledges support from "In
Caixa" Fellowship Program ATR Interpreting Labora-
tories, and Project Enthusias~
2 D i s a m b i g u a t i o n t h r o u g h
C o n t e x t u a l I n f o r m a t i o n This project addresses the problem of choosing the most appropriate semantic parse for any given in- put The approach is to combine discourse informa- tion with the set of possible parses provided by the Phoenix parser for an input string The discourse module selects one of these possibilities The deci- sion is to be based on:
1 The domain of the dialogue JANUS deals with dialogues restricted to a domain, such as scheduling an appointment or making travel ar- rangements The general topic provides some information about what types of exchanges, and therefore speech acts, can be expected
2 The macro-structure of the dialogue up to that point We can divide a dialogue into smaller, self-contained units that provide information on what phases are over or yet to be covered: Are
we past the greeting phase? If a flight was re- served, should we expect a payment phase at some point in the rest of the conversation'?
3 The structure of adjacency pairs (Schegloff and Sacks, 1973), together with the responses to speech functions (Halliday, 1994: Martin 1992)
If one speaker has uttered a request for infor- mation, we expect some sort of response to that
- - an answer, a disclaimer or a clarification The domain of the dialogues, named travel plan- nin 9 domain, consists of dialogues where a customer makes travel arrangements with a travel agent or
a hotel clerk to book hotel rooms, flights or other forms of transportation They are task-oriented di- alogues, in which the speakers have specific goals of carrying out a task that involves the exchange of both intbrmation and services
Discourse processing is structured in two different levels: the context module keeps a global history of the conversation, from which it will be able to esti- mate, for instance, the likelihood of a greeting once the opening phase of the conversation is over A more local history predicts the expected response in
Trang 2any a d j a c e n c y pair such as a q u e s t i o n - a n s w e r se-
quence T h e model a d o p t e d here is t h a t of a two-
layered finite s t a t e machine (henceforth F S M ) a n d
the a p p r o a c h is t h a t of late-stage di.sarnbzguatlon
where as muci~ i n f o r m a t i o n as possible is collected
before proceeding on to d i s a m b i g u a t i o n , r a t h e r t h a n
restricting the parser's search earlier on
3 R e p r e s e n t a t i o n o f S p e e c h A c t s i n
P h o e n i x
W r i t i n g tile a p p r o p r i a t e g r a m m a r s and d e c i d i n g on
the set o f speech acts for this d o m a i n is also an im-
p o r t a n t p a r t of this project The selected speech
acts are encoded in the g r a m m a r - - in the P h o e n i ×
case a s e m a n t i c g r a m m a r - - the tokens of whici~
are concepts thac the segment in question represents
Any u t t e r a n c e is divided into SDUs - - S e m a n t i c Di-
alogue Units - - which are fed to the parser one a t a
time SDUs represent a full concept, expression, or
thought, but not necessarily a c o m p l e t e g r a m m a t i -
cal sentence Let us take an e x a m p l e input, a n d a
possible parse for it:
(1) C o u l d y o u tell m e t h e p r i c e s a t t h e H o l i d a y I n n ?
,[request] (COULD YOU
;[reques¢-mfo} (TELL ME
,'[price-into] (THE PRICES
([establishment] (AT THE
, [estabhshmenc-name] (HOLIDAY INN))))))))))
T h e top-level concepts of the g r a m m a r are speech
acts themselves, the ones i m m e d i a t e l y after are fur-
ther refinements of the speech act, and the lower
level concepts c a p t u r e the specifics of the u t t e r a n c e
such as the n a m e of the hotel in the above e x a m p l e
4 T h e D i s c o u r s e P r o c e s s o r
T h e discourse m o d u l e processes the global a n d lo-
cal s t r u c t u r e of the dialogue in two different lay-
ers The first one is a general o r g a n i z a t i o n of
tile d i a l o g u e ' s s u b p a r t s : the layer under t h a t pro-
,:esses the possible sequence of speech acts in a
s u b p a r t T h e a s s u m p t i o n is t h a t n e g o t i a t i o n di-
alogues develop m a p r e d i c t a b l e way - - this as-
s u m p t i o n was also m a d e for scheduling d i a l o g u e s in
tile Verbmobil project (Maier, I096) - - with t h r e e
,'lear phases: mlttalizatwn, negotiation, and dos-
rag \Ve will call the m i d d l e phase in our d i a l o g u e s
the task performance phase, since it is not a l w a y s
a n e g o t i a t i o n per se W i t h i n the task p e r f o r m a n c e
phase very m a n y subdialogues can take place, such
as i n t b r m a t i o n - s e e k i n g , decision-making, p a y m e n t
clarification, etc
Disco trse processing has frequently m a d e use of
~equeuces of speech acts as they occur in the d i a -
logue, through b i g r a m p r o b a b i l i t i e s of occurrences
or through m o d e l l i n g in a finite s t a t e m a c h i n e
(31aier 1.996: Reithinger e t a [ , t9.96: Iida and Ya-
m a o k a 1990: Qu et al 1996) However t a k i n g i n t o
account only the speech act of the previous s e g m e n t
Phoenix P~l'~er
?J~c 7.~¢ 3
! Discourse ~|odule
G l o o a l S t ~ c t u r e
Local structure
•
v
NrLal C l ~ e :
i 1~'~ Tree 2
Figure 1: T h e D i s c o u r s e M o d u l e
m i g h t leave us with insufficient i n f o r m a t i o n to decide
- as is the case in some elliptical u t t e r a n c e s which
do not follow a strict adjacency p a i r sequence: (2) (talking about flight times }
S1 [ can give you the arrival time Do you have that information already'?
S2 No [ don't
$1 It's twelve fifteen
If we are in parsing tile segment "'It's twelve fif- teen", and our only source of i n f o r m a t i o n is the pre- vious segment "'No [ d o n ' t ' , we c a n n o t possibly find tile referent for "'twelve fifteen", unless we know
we are in a s u b d i a l o g u e discussing flight times, and arrival times have been previously m e n t i o n e d
O u r a p p r o a c h aims at o b t a i n i n g i n f o r m a t i o n b o t h from the s u b d i a l o g u e structure and the speech act sequence by m o d e l l i n g the global s t r u c t u r e of tile di- alogue with a FSM with o p e n i n g a n d c l o s i n g as initial and final states, and o t h e r possible s u b d i a -
l o g u e s i n the intervening states Each one of those
s t a t e s contains a FSAI itself, which d e t e r m i n e s the allowed speech acts in a given s u b d i a l o g u e and their sequence For a picture of the discourse c o m p o n e n t here proposed, see Figure I
Let us look at another e x a m p l e where the use
of i n f o r m a t i o n on the previous c o n t e x t and on tile
s p e a k e r a I t e r n a n c e will help choose the m o s t a p p r o -
p r i a t e parse and thus achieve a b e t t e r t r a n s l a t i o n
Trang 3The expression "okay" can be a prompt for an an-
swer (3), an acceptance of a previous offer (4) or
a backchanneling element, i.e., an acknowledgement
that the previous speaker's utterance has been un-
derstood (5)
(3) $1 So we'll switch you to a double room okay?
(4) S1 So we'll switch you to a double room
$2 Okay
(5) S1 The double room is $90 a night
$2 Okay, and how much is a single room?
In example (3), we will know that "okay" is a
prompt, because it is uttered by the speaker after
he or she has made a suggestion In example (4), it
will be an acceptance because it is uttered after the
previous speaker's suggestion And in (5) it is an
acknowledgment of the information provided The
correct assignment of speech acts will provide a more
accurate translation into other languages
To summarize, the two-layered FSM models a con-
versation through transitions of speech acts that are
included in subdialogues When the parser returns
an ambiguity in the form of two or more possible
speech acts, the FSM will help decide which one is
the most appropriate given the context
There are situations where the path followed in
the two layers of the structure does not match the
parse possibility we are trying to accept or reject
One such situation is the presence of clarification
and correction subdialogues at any point in the con-
versation In that case, the processor will try to
j u m p to the upper layer, in order to switch the sub-
dialogue under consideration We also take into ac-
count the situation where there is no possible choice,
either because the FSM does not restrict the choice
i.e., the FSM allows all the parses returned by
the parser - - or because the model does not allow
any of them In either of those cases, the transition
is determined by unigram probabilities of the speech
act in isolation, and bigrams of the combination of
the speech act we are trying to disambiguate plus its
predecessor
5 E v a l u a t i o n
The discourse module is being developed on a set of
29 dialogues, totalling 1,393 utterances An evalu-
ation will be performed on 10 dialogues, previously
unseen by the discourse module Since the mod-
ule can be either incorporated into the system, or
turned off, the evaluation will be on the system's
performance with and without the discourse module,
Independent graders assign a grade to the quality
of the translation 1 A secondary evaluation will be
IThe final results of this evaluation will be available
at the time of the ACL conference
based on the quality of the speech act disambigua- tion itself, regardless of its contribution to transla- tion quality
6 C o n c l u s i o n a n d F u t u r e W o r k
In this paper I have presented a model of dialogue structure in two layers, which processes the sequence
of subdialogues and speech acts in task-oriented dialogues in order to select the most appropriate from the ambiguous parses returned by the Phoenix parser The model structures dialogue in two lev- els of finite state machines, with the final goal of improving translation quality
A possible extension to the work here described would be to generalize the two-layer model to other less homogeneous domains T h e use of statistical information in different parts of the processing, such
as the arcs of the FSM, could enhance performance
R e f e r e n c e s Michael A K Halliday 1994 An Introduction to Func-
tion)
Hitoshi lida and Takyuki Yamaoka 1990 Dialogue Structure Analysis Method and Its Application to Pre- dicting the Next Utterance Dialogue Structure Anal- ysis German-Japanese Workshop, Kyoto, Japan Alon Lavie, Donna Gates, Marsal Gavaldh, Laura May- field, Alex Waibet, Lori Levin 1996 Multi-lingual Translation of Spontaneously Spoken Language in a Limited Domain In Proceedings o.f COLING 96
Copenhagen
Alon Lavie and Masaru Tomita 1993 GLR*: An Ef- ficient Noise Skipping Parsing Algorithm for Context Free Grammars In Proceedings o.f the Third [nterna-
Tilburg, The Netherlands
Elisabeth Maier 1996 Context Construction as Sub- task of Dialogue Processing: The Verbmobil Case In
Proceedings of the Eleventh Twente Workshop on Lan-
James Martin 1992 English Text: System and Struc-
'fan Qu, Barbara Di Eugenio, Alon Lavie, Lori Levin
1996 Minimizing Cumulative Error in Discourse Con- text In Proceedings o] ECAI 96, Budapest, Hungary Norbert Reithinger, Ralf Engel, Michael Kipp Martin Klesen 1996 Predicting Dialogue Acts for a Speech- to-Speech Translation System In Proceedings of IC-
Emmanuel Schegloff and Harvey Sacks 1973 Opening
up Closings Semiotica 7, pages 289-327
Wayne Ward 1 9 9 1 Understanding Spontaneous Speech: the Phoenix System In Proceedings of
I C A S S P 91