Báo cáo khoa học: "Splitting Long or Ill-formed Input for Robust Spoken-language Translation" docx

The complete translation result is formed by concatenating the partial translation results of each split unit.. Experi- mental results show that the proposed method gives TDMT the follow

Trang 1

Splitting Long or Ill-formed Input for Robust Spoken-language Translation

O s a m u F U R U S E t, S e t s u o Y A M A D A , K a z u h i d e Y A M A M O T O

A T R Interpreting Telecommunications Research Laboratories 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, J a p a n furuse~cslab, kecl ntt co jp, {syamada, yamamoto}@itl, atr co jp

A b s t r a c t This paper proposes an input-splitting method

for translating spoken-language which includes

many long or ill-formed expressions The pro-

posed method splits input into well-balanced

translation units based on a semantic distance

calculation The splitting is performed dur-

ing left-to-right parsing, and does not degrade

translation efficiency The complete translation

result is formed by concatenating the partial

translation results of each split unit The pro-

posed method can be incorporated into frame-

works like TDMT, which utilize left-to-right

parsing and a score for a substructure Experi-

mental results show that the proposed method

gives TDMT the following advantages: (1) elim-

ination of null outputs, (2) splitting of utter-

ances into sentences, and (3) robust translation

of erroneous speech recognition results

1 I n t r o d u c t i o n

A spoken-language translation system requires

the ability to treat long or ill-formed input An

utterance as input of a spoken-language trans-

lation system, is not always one well-formed

sentence Also, when treating an utterance in

speech translation, the speech recognition result

which is the input of the translation component,

might be corrupted even though the input utter-

ance is well-formed Such a misrecognized result

can cause a parsing failure, and consequently, no

translation output would be produced Further-

more, we cannot expect that a speech recogni-

tion result includes punctuation marks such as

a comma or a period between words, which are

useful information for parsing 1

As a solution for treating long input, long-

sentence splitting techniques, such as that of

tCurrent affiliation is N T T C o m m u n i c a t i o n Science

L a b o r a t o r i e s

1 P u n c t u a t i o n m a r k s are not used in translation input

in this p a p e r

Kim (1994), have been proposed These techniques, however, use many splitting rules written manually and do not treat ill-formed input Wakita (1997) proposed a robust translation method which locally extracts only reliable parts, i.e., those within the semantic distance threshold and over some word length This technique, however, does not split input into units globally, or sometimes does not output any translation result

This paper proposes an input-splitting method for robust spoken-language translation The proposed method splits input into well- balanced translation units based on a semantic distance calculation The complete translation result is formed by concatenating the partial translation results of each split unit The proposed method can be incorporated into frameworks that utilize left-to-right parsing and

a score for a substructure, In fact, it has been added to Transfer-Driven Machine Trans- lation (TDMT), which was proposed for efficient and robust spoken-language translation (Fu- ruse, 1994; Furuse, 1996) The splitting is performed during TDMT's left-to-right chart parsing strategy, and does not degrade translation efficiency The proposed method gives TDMT the following advantages: (1) elimination of null outputs, (2) splitting of utterances into sentences, and (3) robust translation of erroneous speech recognition results

In the subsequent sections, we will first out- line the translation strategy of TDMT Then,

we will explain the framework of our splitting method in Japanese-to-English (JE) and English-to-Japanese (E J) translation Next, by comparing the TDMT system's performance between two sets of translations with and without using the proposed method, we will demon- strate the usefulness of our method

Trang 2

2 T r a n s l a t i o n s t r a t e g y o f T D M T

2.1 T r a n s f e r k n o w l e d g e

T D M T produces a translation result by mim-

icking the example judged most semantically

similar to the input string, based on the idea

of Example-Based MT Since it is difficult to

store enough example sentences to translate ev-

ery input, T D M T performs the translation by

combining the examples of the partial expres-

sions, which are represented by transfer knowl-

edge patterns Transfer knowledge in T D M T is

compiled from translation examples The fol-

lowing EJ transfer knowledge expression indi-

cates t h a t the English p a t t e r n "X at Y" corre-

sponds to several possible Japanese expressions:

X at Y = > yt de X t ((present, conference) ),

V ni X ~ ((stay, hotel) ), }'I wo X I ((look, it) )

The first possible translation p a t t e r n is " V de

X " , with example set ((present, conference) )

We will see that this p a t t e r n is likely to be se-

lected to the extent t h a t the input variable bind-

ings are semantically similar to the sample bind-

ings, where X ="present" and Y ="conference"

X' is the transfer result of X

The source expression of the transfer knowl-

edge is expressed by a constituent boundary

pattern, which is defined as a sequence that

consists of variables and symbols representing

constituent boundaries (Furuse, 1994) A vari-

able corresponds to some linguistic constituent

A constituent b o u n d a r y is expressed by either

a functional word or a part-of-speech bigram

marker In the case that there is no func-

tional surface word t h a t divides the expression

into two constituents, a part-of-speech bigram

is employed as a b o u n d a r y marker, which is ex-

pressed by h y p h e n a t i n g the parts-of-speech of a

left-constituent's last word and that of a right-

constituent's first word

For instance, the expression "go to Kyoto" is

divided into two constituents, "go" and "Kyoto'

T h e preposition "to" can be identified as a con-

stituent boundary Therefore, in parsing "go to

Kyoto", we use the p a t t e r n "X to Y"

The expression "I go" can be divided into

two constituents " f ' and "go", which are a pro-

noun and a verb, respectively Since there is

no functional surface word between the two

constituents, pronoun-verb can be inserted as a

b o u n d a r y marker into "I go", giving "I pronoun-

verb go", which will now m a t c h the general

transfer knowledge p a t t e r n "X pronoun-verb Y '

2.2 L e f t - t o - r i g h t p a r s i n g

In T D M T , possible source language structures are derived by applying the constituent boundary patterns of transfer knowledge source parts

to an input string in a left-to-right fashion (Fu- ruse, 1996), based on a chart parsing m e t h o d

An input string is parsed by combining active and passive arcs shifting the processed string left-to-right In order to limit the combinations of patterns during p a t t e r n application, each pattern is assigned its linguistic level, and for each linguistic level, we specify the linguistic sublevels permitted to be used in the assigned variables

I X pronoun-verb Y X pronoun-verb Y

X pronoun-verb Y

Figure 1: Substructures for "I go to Kyoto"

Figure 1 shows the substructures for each pas-

sive arc and each active arc in "I go to Kyoto"

A processed string is indicated by "~" A passive arc is created from a content word shown in (a), or from a combination of patterns for which all of the variables are instantiated, like (c), (e), and (f) An active arc, which corresponds to an incomplete substructure, is created from a combination of patterns some of which have uninstantiated variables as right-hand neighbors to the processed string, like (b) and (d)

If the processed string creates a passive arc for

a substring and the passive arc satisfies the left- most part of an uninstantiated variable in the

p a t t e r n of active arcs for the left-neighboring substring, the variable is instantiated with the passive arc Suppose t h a t the processed string

is "Kyoto" in "I go to Kyoto" The passive arc

(e) is created, and it instantiates Y of the active arc (b) Thus, by combining (b) and (e),

the structure of "I go to Kyoto" is composed like

(f) If a passive arc is generated in such op- eration, the creation of a new arc by variable instantiation is repeated If a new arc can no longer be created, the processed string is shifted

Trang 3

to the right-neighboring string If the whole in-

put string can be covered with a passive arc, the

parsing will succeed

2.3 D i s a m b i g u a t i o n

The left-to-right parsing determines the best

structure and best transferred result locally by

performing structural disambiguation using se-

mantic distance calculations, in parallel with

the derivation of possible structures (Furuse,

1996) The best structure is determined when

a relative passive arc is created Only the

best substructure is retained and combined with

other arcs The best structure is selected by

computing the total sum of all the possible

combinations of the partial semantic distance

values The structure with the smallest to-

tal distance is chosen as the best structure

The semantic distance is calculated according

to the relationship of the positions of the words'

semantic attributes in the thesaurus (Sumita,

1992)

3 S p l i t t i n g s t r a t e g y

If the parsing of long or ill-formed input is only

undertaken by the application of stored pat-

terns, it often fails and generates no results

Our strategy to parse such input, is to split the

input into units each of which can be parsed and

translated, and is explained as items (A)-(F) in

this section

3.1 C o n c a t e n a t i o n o f n e i g h b o r i n g

s u b s t r u c t u r e s

The splitting is performed during left-to-right

parsing as follows:

( A ) Neighboring passive arcs can create a

larger passive arc by concatenating them

( B ) A passive arc which concatenates neigh-

boring passive arcs can be further concate-

nated with the right-neighboring passive

a r c

These items enable two neighboring substruc-

tures to compose a structure even if there is no

stored pattern which combines them Figure 2

shows structure composition from neighboring

substructures based on these items, a, ~3, and

7 are structures of neighboring substrings The

triangles express substructures composed only

from stored patterns The boxes express sub-

structures produced by concatenating neighbor-

ing substructures ~ is composed from its neigh-

boring substructures, i.e., a and 8 In addition,

e is composed from its neighboring substructures, i.e., ~f and 7

Figure 2: Structure from split substructures

Items (A) and (B) enable such a colloquial utterance as (1) to compose a structure by splitting, as shown in Figure 3

(1) "Certainly sir for how many people please"

Figure 3: Structure for (1)

3.2 S p l i t t i n g i n p u t i n t o w e l l - f o r m e d

p a r t s a n d i l l - f o r m e d p a r t s Item (C) splits input into well-formed parts and ill-formed parts, and enables parsing in such cases where the input is ill-formed or the translation rules are insufficient The well-formed parts can be applied patterns or they can consist of one content word The ill-formed parts, which consist of one functional word or one part-of-speech bigram marker, are split from the well-formed parts

(c) In addition to content words, boundary markers, namely, any functional words and inserted part-of-speech bigram markers, also create a passive arc and compose

a substructure

(2) "They also have tennis courts too plus a disco"

(3) "Four please two children two adults"

Suppose that the substrings of utterance (2),

"they also have tennis courts too" and "a disco",

can create a passive arc, and that the system has not yet learned a pattern to which preposition

"plus" is relevant, such as "X plus Y" or "plus

X ' Also, suppose that the substrings of utterance (3), "four please" and "two children two adults",

can create a passive arc, that part-of-speech

Trang 4

bigram marker "adverb-numeral' is inserted be-

tween these substrings, and that the system

does not know p a t t e r n "X adverb-numeral Y" to

combine a sentence for X and a noun phrase for

Y

By item (C), utterances (2) and (3) can be

parsed in these situations as shown in Figure 4

Figure 4: Structures for (2) and (3)

3 3 S t r u c t u r e p r e f e r e n c e

Although the splitting strategy improves ro-

bustness of the parsing, heavy dependence on

the splitting strategy should be avoided Since

a global structure has more syntactic and se-

mantic relations t h a n a set of fragmental ex-

pressions, in general, the translation of a global

expression tends to be better than the transla-

tion of a set of fragmental expressions Accord-

ingly, the splitting strategy should be used as a

backup function

Figure 5 shows three possible structures for

"go to Kyoto" (a) is a structure relevant to pat-

tern "X to Y" at the verb phrase level In (b),

the i n p u t string is split into two substrings, "go"

and "to Kyoto" In (c), the input string is split

into three substrings, "go", "to", and "Kyoto"

The digit described at the vertex of a triangle

is the s u m of distance values for that strucure

A m o n g these three, (a), which does not use

splitting, is the best structure Item (D) is regu-

lated to give low priority to structures including

split substructures

( D ) W h e n a structure is composed by splitting,

a large distance value is assigned

In the T D M T system, the distance value in

each variable varies from 0 to 1 We experimen-

tally assigned the distance value of 5.00 to one

application of splitting, and 0.00 to the struc-

ture including only one word or one part-of-

(a)

/,,9, 33 [

(b) 0.00 0.00 0.00

(c)

Figure 5: Structures for "go to Kyoto"

speech bigram marker 2 Suppose that substructures in Figure 5 are assigned the following distance values T h e total distance value of (a) is 0.33 T h e splitting

is applied to (b) and (c), once and twice, respectively Therefore, the total distance value

of (b) is 0.00+0.33+5.00x 1=5.33, and t h a t of (c)

is 0.00+0.00+0.00+5.00x2=10.00 (a) is selected

as the best structure because it gives the smallest total distance value

3 4 T r a n s l a t i o n o u t p u t

The results gained from a structure correspond- ing to a passive arc can be transferred and a partial translation result can then be generated The translation result of a split structure is formed as follows:

( E ) The complete translation result is formed

by concatenating the partial translation results of each split unit

A p u n c t u a t i o n mark such as "," can be inserted between partial translation results to make the complete translation result clear, although we cannot expect p u n c t u a t i o n in an in-

p u t utterance The EJ translation result of utterance (1) is as follows:

certainly sir I for how many people please h~ai , nan-nin ~desuka

Strings such as functional words and part-oh speech bigram markers have no target expression, and are transferred as follows:

2These values are tentatively assigned through comparing the splitting performance for some values, and are effective only for the present TDMT system

Trang 5

Table 1: Effect of splitting on translation performance output rate (%) parsing success rate/%) output understandability (%)

( F ) A string which does not have a target ex-

pression, is transferred to a string as " ",

which means an incomprehensible part

The EJ translation results of utterances (2)

and (3) are as follows "r' denotes a splitting

position

they also have tennis courts too I plus la disco

douyouni tenisu-kooto ga mata ari-masu, , disuko

four please ladverb-numeral Itwo children two adults

I I

f u t a ~ otona futari yon o-negai-shi masu, ,kodomo

4 E f f e c t o f s p l i t t i n g

The splitting strategy based on items (A)-(F)

in Section 3 can be introduced to frameworks

such as T D M T , which utilize left-to-right pars-

ing and a score for a substructure We discuss

the effect of splitting by showing experimental

results of the T D M T system's JE, E J, Japanese-

to-Korean ( J g ) , and Korean-to-Japanese ( g J )

translations 3 The T D M T system, whose

domain is travel conversations, presently can

treat multi-lingual translation The present vo-

cabulary size is about 13,000 words in J E and

JK, about 7,000 words in EJ, and about 4,000

words in KJ T h e number of training sentences

is about 2,900 in JE and EJ, about 1,400 in JK,

and about 600 in KJ

4.1 N u l l - o u t p u t e l i m i n a t i o n

It is crucial for a machine translation system to

o u t p u t some result even t h o u g h the input is ill-

formed or the translation rules are insufficient

Items (C) and (D) in Section 3, split input into

well-formed parts and ill-formed parts so that

weU-formed parts can cover the input as widely

as possible Since a content word and a p a t t e r n

t i n the experimental results referred to later in this

section, the i n p u t does not consist of strings b u t of cor-

rect morpheme sequences This enables us to focus on

the evaluation of our splitting method by excluding cases

where the morphological analysis fails

can be assigned some transferred results, some translation result can be produced if the input has at least one well-formed part

Table 1 shows how the splitting improves the translation performance of T D M T More t h a n 1,000 sentences, i.e., new d a t a for the system, were tested in each kind of translation There was no null o u t p u t , and a 100 % o u t p u t rate

in every translation So, by using the splitting

m e t h o d , the T D M T can eliminate null o u t p u t unless the morphological analysis gives no result or the input includes no content word T h e splitting also improves the parsing success rate and the understandability of the o u t p u t in every translation

The o u t p u t rates of the JK and KJ translations were small without splitting because the

a m o u n t of sample sentences is less t h a n t h a t for the JE and EJ translations However, the splitting compensated for the shortage of sample sentences and raised the o u t p u t rate to 100 % Since Japanese and Korean are linguistically close, the splitting m e t h o d increases the understandable results for JK and KJ translations more than for JE and EJ translations

4.2 U t t e r a n c e s p l i t t i n g i n t o s e n t e n c e s

In order to gain a good translation result for

an utterance including more t h a n one sentence, the utterance should be split into proper sentences The distance calculation mechanism aims to split an utterance into sentences correctly

(4) "Yes that will be fine at five o'clock we will re-

move the bea~'

For instance, splitting is necessary to translate utterance (4), which includes more t h a n one sentence The candidates for (4)'s structure are shown in Figure 6 The total distance value

of (a) is 0.00+1.11+5.00×1=6.11, that of (b) is 0.00+0.00+1.11+5.00×2=11.11, and that of (c) is 0.83+0.00+0.42+5.00×2=11.25 As (a) has the smallest total distance, it is chosen as the best structure, and this agrees with our intuition

Trang 6

(a)

(b)

(c)

Figure 6: Structures for (4)

We have checked the accuracy of utterance

splitting by using 277 Japanese utterances and

368 English utterances, all of which included

more t h a n one sentence Table 2 shows the suc-

cess rates for splitting the utterances into sen-

tences Although T D M T can also use the pat-

tern "X boundary Y" in which X and Y are at

the sentence level to split the utterances, the

proposed splitting m e t h o d increases the success

rates for splitting the utterances in both lan-

guages

Table 2: Success rates for splitting utterances

w/o splitting w/ splitting

4.3 T r a n s l a t i o n a f t e r s p e e c h r e c o g n i t i o n

Speech recognition sometimes produces inaccu-

rate results from an actual utterance, and erro-

neous parts often provide ill-formed translation

inputs However, our splitting m e t h o d can also

produce some translation results from such mis-

recognized inputs and improve the understand-

ability of the resulting speech-translation

Table 3 shows an example of a J E translation

of a recognition result including a substitution

error T h e underlined words are misrecognized

parts "youi(preparation)" in the utterance is re-

placed with "yom'(postposition)"

Table 4 shows an example of a JE translation

of a recognition result including an insertion er-

ror "wo" has been inserted into the utterance after speech recognition The translation of the speech recognition result, is the same as that

of the utterance except for the addition of " ";

" " is the translation result for "wo", which is

a postposition mainly signing an object

Table 5 shows an example of the EJ translation of a recognition result including a deletion error "'s" in the utterance is deleted after speech recognition In the translation of this result, " " appears instead of "wa", which is

a postposition signing topic " " is the translation for marker "pronoun-adverb", which has been inserted between "that" and "a//" T h e recognition result is split into three parts "yes

that", "pronoun-adverb", and "all correct" Al- though the translations in Tables 3, 4, and

5 might be slightly degraded by the splitting, the meaning of each utterance can be commu- nicated with these translations

We have experimented the effect of splitting on JE speech translation using 47 erroneous recognition results of Japanese utterances These utterances have been used as example utterances by the T D M T system There- fore, for utterances correctly recognized, the translations of the recognition results should succeed The erroneous recognition results were collected from an experimental base using the

m e t h o d of Shimizu (1996)

Table 6 shows the numbers of sentences at each level based on the extent that the meaning of an utterance can be u n d e r s t o o d from the translation result W i t h o u t the splitting, only 19.1% of the erroneous recognition results are wholly or partially understandable T h e splitting m e t h o d increases this rate to 57.4% Fail- ures in spite of the splitting are mainly caused

by the misrecognition of key parts such as pred- icates

Table 6: Translation after erroneous recognition

wholly understandable partially

understandable

misunderstood, or

never understandable null output

w/o splitting w/splitting

6 (12.8%) 15 (31.9%)

3 (6.3%) 12 (25.5%)

6 (12.8%) 20 (42.6%)

32 (68.1%) 0 (0.0%)

4.4 T r a n s l a t i o n t i m e

Since our splitting m e t h o d is performed under left-to-right parsing, translation efficiency is not

Trang 7

Table 3: Substitution error in JE translation

I translation input I TDMT system's translation result I utterance I Chousyokn no go yoni wa deki masu ga

recognition result I Chousyoku no go yori wa deki masu ga I We can prepare breakfast Breakfast we can do I

Table 4: Insertion error in JE translation

I translation input I TDMT system's translation result I

i utterance I Sore'o h"s o, esu I is a rese o"on ecesso ' I

recognition result Soreto w_go yoyaku ga hitsuyou desu ka And is a reservation necessary?

Table 5: Deletion error in EJ translation

I I translation input I TDMT system's translation result I

I utterance [ Yesthat'sallcorrect[ Haisorewamattakutadashiidesn I

recognition result Yes that all correct Hai sore mattaku tadashii desu

a serious problem We have compared EJ trans-

lation times in the T D M T system for two cases

One was without the splitting method, and the

other was with it Table 7 shows the translation

time of English sentences with an average in-

put length of 7.1 words, and English utterances

consisting of more than one sentence with an

average input length of 11.4 words The trans-

lation times of the T D M T system written in

LISP, were measured using a Sparcl0 worksta-

tion

Table 7: Translation time of EJ

input w/o splitting w/splitting

sentence 0.35sec 0.36sec

utterance 0.60sec 0.61sec

The time difference between the two situa-

tions is small This shows that the translation

efficiency of T D M T is maintained even if the

splitting method is introduced to TDMT

5 C o n c l u d i n g r e m a r k s

We have proposed an input-splitting method

for translating spoken-language which includes

many long or ill-formed expressions Experi-

mental results have shown that the proposed

method improves TDMT's performance with-

out degrading the translation efficiency The

proposed method is applicable to not only

T D M T but also other frameworks that uti-

lize left-to-right parsing and a score for a

substructure One important future research

goal is the achievement of a simultaneous in-

terpretation mechanism for application to a

practical spoken-language translation system

The left-to-right mechanism should be main-

tained for that purpose Our splitting method

meets this requirement, and can be applied to multi-lingual translation because of its universal framework

R e f e r e n c e s

O Furuse and H Iida 1994 Constituent Boundary Parsing for Example-Based Ma- chine Translation In Proc of Coling '94,

pages 105-111

O Furuse and H Iida 1996 Incremental Translation Utilizing Constituent Boundary Patterns In Proc of Coling '96, pages 412-

417

Y.B Kim and T Ehara 1994 An Auto- matic Sentence Breaking and Subject Supple- ment Method for J / E Machine Translation (in Japanese) In Transactions of Informa- tion Processing Society of Japan, Vol 35, No

6, pages 1018-1028

T Shimizu, H Yamamoto, H Masataki,

S Matsunaga, and Y Sagisaka 1996 Spon- taneous Dialogue Speech Recognition using Cross-word Context Constrained Word Graphs In Proc of I C A S S P '96, pages 145-

148

E Sumita and H Iida 1992 Example-Based Transfer of Japanese Adnominai Particles into English I E I C E Transactions on Infor- mation and Systems, E75-D, No 4, pages

585-594

Y Wakita, J Kawai, and H Iida 1997 Cor- rect parts extraction from speech recognition results using semantic distance calculation, and its application to speech translation In

Proc of ACL//EACL Workshop on Spoken Language Translation, pages 24-31

Định dạng
Số trang	7
Dung lượng	622,78 KB