Tài liệu Báo cáo khoa học: "Improving Translation through Contextual Information" doc

A speaker's utterance of an elliptical expression, like the figure "'twelve fifteen", might have a different meaning depending on the context of situation, the way the conversation has

Trang 1

Improving Translation through Contextual Information

M a i t e T a b o a d a "

C a r n e g i e M e l l o n U n i v e r s i t y

5000 F o r b e s A v e n u e

P i t t s b u r g h , P A 15213

t a b o a d a + © c m u , edu

A b s t r a c t This paper proposes a two-layered model

of dialogue structure for task-oriented di-

alogues that processes contextual informa-

tion and disambiguates speech acts The

final goal is to improve translation quality

in a speech-to-speech translation system

1 A m b i g u i t y i n S p e e c h T r a n s l a t i o n

For any given utterance out of what we can loosely

call context, there is usually more than one possible

interpretation A speaker's utterance of an ellipti-

cal expression, like the figure "'twelve fifteen", might

have a different meaning depending on the context of

situation, the way the conversation has evolved un-

til that point, and the previous speaker's utterance

"Twelve fifteen" could be the time "a quarter after

twelve", the price "one thousand two hundred and

fifteen", the room number "'one two one five", and so

on Although English can conflate all those possible

meanings into one expression, the translation into

other languages usually requires more specificity

If this is a problem for any human listener, the

problem grows considerably when it is a parser do-

ing the disambiguation In this paper, I explain how

we can use discourse knowledge in order to help a

parser disambiguate among different possible parses

for an input sentence, with the final goal of improv-

ing the translation in an end-to-end speech transla-

tion system

The work described was conducted within the

JANUS multi-lingual speech-to-speech translation

system designed t o translate spontaneous dialogue

in a limited domain (Lavie et al 1996) The

machine translation component of JANUS handles

these problems using two different approaches: the

Generalized Left-to-Right parser GLR* (Lavie and

Tomita, 1993) and Phoenix the latter being the fo-

cus of this paper

*The author gratefully acknowledges support from "In

Caixa" Fellowship Program ATR Interpreting Labora-

tories, and Project Enthusias~

2 D i s a m b i g u a t i o n t h r o u g h

C o n t e x t u a l I n f o r m a t i o n This project addresses the problem of choosing the most appropriate semantic parse for any given input The approach is to combine discourse information with the set of possible parses provided by the Phoenix parser for an input string The discourse module selects one of these possibilities The decision is to be based on:

1 The domain of the dialogue JANUS deals with dialogues restricted to a domain, such as scheduling an appointment or making travel arrangements The general topic provides some information about what types of exchanges, and therefore speech acts, can be expected

2 The macro-structure of the dialogue up to that point We can divide a dialogue into smaller, self-contained units that provide information on what phases are over or yet to be covered: Are

we past the greeting phase? If a flight was re- served, should we expect a payment phase at some point in the rest of the conversation'?

3 The structure of adjacency pairs (Schegloff and Sacks, 1973), together with the responses to speech functions (Halliday, 1994: Martin 1992)

If one speaker has uttered a request for information, we expect some sort of response to that

- - an answer, a disclaimer or a clarification The domain of the dialogues, named travel plan- nin 9 domain, consists of dialogues where a customer makes travel arrangements with a travel agent or

a hotel clerk to book hotel rooms, flights or other forms of transportation They are task-oriented dialogues, in which the speakers have specific goals of carrying out a task that involves the exchange of both intbrmation and services

Discourse processing is structured in two different levels: the context module keeps a global history of the conversation, from which it will be able to esti- mate, for instance, the likelihood of a greeting once the opening phase of the conversation is over A more local history predicts the expected response in

Trang 2

any a d j a c e n c y pair such as a q u e s t i o n - a n s w e r se-

quence T h e model a d o p t e d here is t h a t of a two-

layered finite s t a t e machine (henceforth F S M ) a n d

the a p p r o a c h is t h a t of late-stage di.sarnbzguatlon

where as muci~ i n f o r m a t i o n as possible is collected

before proceeding on to d i s a m b i g u a t i o n , r a t h e r t h a n

restricting the parser's search earlier on

3 R e p r e s e n t a t i o n o f S p e e c h A c t s i n

P h o e n i x

W r i t i n g tile a p p r o p r i a t e g r a m m a r s and d e c i d i n g on

the set o f speech acts for this d o m a i n is also an im-

p o r t a n t p a r t of this project The selected speech

acts are encoded in the g r a m m a r - - in the P h o e n i ×

case a s e m a n t i c g r a m m a r - - the tokens of whici~

are concepts thac the segment in question represents

Any u t t e r a n c e is divided into SDUs - - S e m a n t i c Di-

alogue Units - - which are fed to the parser one a t a

time SDUs represent a full concept, expression, or

thought, but not necessarily a c o m p l e t e g r a m m a t i -

cal sentence Let us take an e x a m p l e input, a n d a

possible parse for it:

(1) C o u l d y o u tell m e t h e p r i c e s a t t h e H o l i d a y I n n ?

,[request] (COULD YOU

;[reques¢-mfo} (TELL ME

,'[price-into] (THE PRICES

([establishment] (AT THE

, [estabhshmenc-name] (HOLIDAY INN))))))))))

T h e top-level concepts of the g r a m m a r are speech

acts themselves, the ones i m m e d i a t e l y after are fur-

ther refinements of the speech act, and the lower

level concepts c a p t u r e the specifics of the u t t e r a n c e

such as the n a m e of the hotel in the above e x a m p l e

4 T h e D i s c o u r s e P r o c e s s o r

T h e discourse m o d u l e processes the global a n d lo-

cal s t r u c t u r e of the dialogue in two different lay-

ers The first one is a general o r g a n i z a t i o n of

tile d i a l o g u e ' s s u b p a r t s : the layer under t h a t pro-

,:esses the possible sequence of speech acts in a

s u b p a r t T h e a s s u m p t i o n is t h a t n e g o t i a t i o n di-

alogues develop m a p r e d i c t a b l e way - - this as-

s u m p t i o n was also m a d e for scheduling d i a l o g u e s in

tile Verbmobil project (Maier, I096) - - with t h r e e

,'lear phases: mlttalizatwn, negotiation, and dos-

rag \Ve will call the m i d d l e phase in our d i a l o g u e s

the task performance phase, since it is not a l w a y s

a n e g o t i a t i o n per se W i t h i n the task p e r f o r m a n c e

phase very m a n y subdialogues can take place, such

as i n t b r m a t i o n - s e e k i n g , decision-making, p a y m e n t

clarification, etc

Disco trse processing has frequently m a d e use of

~equeuces of speech acts as they occur in the d i a -

logue, through b i g r a m p r o b a b i l i t i e s of occurrences

or through m o d e l l i n g in a finite s t a t e m a c h i n e

(31aier 1.996: Reithinger e t a [ , t9.96: Iida and Ya-

m a o k a 1990: Qu et al 1996) However t a k i n g i n t o

account only the speech act of the previous s e g m e n t

Phoenix P~l'~er

?J~c 7.~¢ 3

! Discourse ~|odule

G l o o a l S t ~ c t u r e

Local structure

•

v

NrLal C l ~ e :

i 1~'~ Tree 2

Figure 1: T h e D i s c o u r s e M o d u l e

m i g h t leave us with insufficient i n f o r m a t i o n to decide

- as is the case in some elliptical u t t e r a n c e s which

do not follow a strict adjacency p a i r sequence: (2) (talking about flight times }

S1 [ can give you the arrival time Do you have that information already'?

S2 No [ don't

$1 It's twelve fifteen

If we are in parsing tile segment "'It's twelve fifteen", and our only source of i n f o r m a t i o n is the previous segment "'No [ d o n ' t ' , we c a n n o t possibly find tile referent for "'twelve fifteen", unless we know

we are in a s u b d i a l o g u e discussing flight times, and arrival times have been previously m e n t i o n e d

O u r a p p r o a c h aims at o b t a i n i n g i n f o r m a t i o n b o t h from the s u b d i a l o g u e structure and the speech act sequence by m o d e l l i n g the global s t r u c t u r e of tile dialogue with a FSM with o p e n i n g a n d c l o s i n g as initial and final states, and o t h e r possible s u b d i a -

l o g u e s i n the intervening states Each one of those

s t a t e s contains a FSAI itself, which d e t e r m i n e s the allowed speech acts in a given s u b d i a l o g u e and their sequence For a picture of the discourse c o m p o n e n t here proposed, see Figure I

Let us look at another e x a m p l e where the use

of i n f o r m a t i o n on the previous c o n t e x t and on tile

s p e a k e r a I t e r n a n c e will help choose the m o s t a p p r o -

p r i a t e parse and thus achieve a b e t t e r t r a n s l a t i o n

Trang 3

The expression "okay" can be a prompt for an an-

swer (3), an acceptance of a previous offer (4) or

a backchanneling element, i.e., an acknowledgement

that the previous speaker's utterance has been un-

derstood (5)

(3) $1 So we'll switch you to a double room okay?

(4) S1 So we'll switch you to a double room

$2 Okay

(5) S1 The double room is $90 a night

$2 Okay, and how much is a single room?

In example (3), we will know that "okay" is a

prompt, because it is uttered by the speaker after

he or she has made a suggestion In example (4), it

will be an acceptance because it is uttered after the

previous speaker's suggestion And in (5) it is an

acknowledgment of the information provided The

correct assignment of speech acts will provide a more

accurate translation into other languages

To summarize, the two-layered FSM models a con-

versation through transitions of speech acts that are

included in subdialogues When the parser returns

an ambiguity in the form of two or more possible

speech acts, the FSM will help decide which one is

the most appropriate given the context

There are situations where the path followed in

the two layers of the structure does not match the

parse possibility we are trying to accept or reject

One such situation is the presence of clarification

and correction subdialogues at any point in the con-

versation In that case, the processor will try to

j u m p to the upper layer, in order to switch the sub-

dialogue under consideration We also take into ac-

count the situation where there is no possible choice,

either because the FSM does not restrict the choice

i.e., the FSM allows all the parses returned by

the parser - - or because the model does not allow

any of them In either of those cases, the transition

is determined by unigram probabilities of the speech

act in isolation, and bigrams of the combination of

the speech act we are trying to disambiguate plus its

predecessor

5 E v a l u a t i o n

The discourse module is being developed on a set of

29 dialogues, totalling 1,393 utterances An evalu-

ation will be performed on 10 dialogues, previously

unseen by the discourse module Since the mod-

ule can be either incorporated into the system, or

turned off, the evaluation will be on the system's

performance with and without the discourse module,

Independent graders assign a grade to the quality

of the translation 1 A secondary evaluation will be

IThe final results of this evaluation will be available

at the time of the ACL conference

based on the quality of the speech act disambiguation itself, regardless of its contribution to translation quality

6 C o n c l u s i o n a n d F u t u r e W o r k

In this paper I have presented a model of dialogue structure in two layers, which processes the sequence

of subdialogues and speech acts in task-oriented dialogues in order to select the most appropriate from the ambiguous parses returned by the Phoenix parser The model structures dialogue in two levels of finite state machines, with the final goal of improving translation quality

A possible extension to the work here described would be to generalize the two-layer model to other less homogeneous domains T h e use of statistical information in different parts of the processing, such

as the arcs of the FSM, could enhance performance

R e f e r e n c e s Michael A K Halliday 1994 An Introduction to Func-

tion)

Hitoshi lida and Takyuki Yamaoka 1990 Dialogue Structure Analysis Method and Its Application to Pre- dicting the Next Utterance Dialogue Structure Anal- ysis German-Japanese Workshop, Kyoto, Japan Alon Lavie, Donna Gates, Marsal Gavaldh, Laura May- field, Alex Waibet, Lori Levin 1996 Multi-lingual Translation of Spontaneously Spoken Language in a Limited Domain In Proceedings o.f COLING 96

Copenhagen

Alon Lavie and Masaru Tomita 1993 GLR*: An Ef- ficient Noise Skipping Parsing Algorithm for Context Free Grammars In Proceedings o.f the Third [nterna-

Tilburg, The Netherlands

Elisabeth Maier 1996 Context Construction as Sub- task of Dialogue Processing: The Verbmobil Case In

Proceedings of the Eleventh Twente Workshop on Lan-

James Martin 1992 English Text: System and Struc-

'fan Qu, Barbara Di Eugenio, Alon Lavie, Lori Levin

1996 Minimizing Cumulative Error in Discourse Con- text In Proceedings o] ECAI 96, Budapest, Hungary Norbert Reithinger, Ralf Engel, Michael Kipp Martin Klesen 1996 Predicting Dialogue Acts for a Speech- to-Speech Translation System In Proceedings of IC-

Emmanuel Schegloff and Harvey Sacks 1973 Opening

up Closings Semiotica 7, pages 289-327

Wayne Ward 1 9 9 1 Understanding Spontaneous Speech: the Phoenix System In Proceedings of

I C A S S P 91

Tiêu đề	Improving translation through contextual information
Tác giả	Maite Taboada
Trường học	Carnegie Mellon University
Chuyên ngành	Computational linguistics
Thể loại	Research paper
Thành phố	Pittsburgh

Định dạng
Số trang	3
Dung lượng	259,34 KB