DSpace at VNU: Refining lexical translation training scheme for improving the quality of statistical phrase-based translation

Refining Lexical Translation Training Scheme forImproving The Quality of Statistical Phrase-Based Translation Cuong Hoang1, Cuong Anh Le1, Son Bao Pham1,2 1 Faculty of Information Techno

Trang 1

Refining Lexical Translation Training Scheme for

Improving The Quality of Statistical Phrase-Based

Translation

Cuong Hoang1, Cuong Anh Le1, Son Bao Pham1,2

1 Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi

2 Information Technology Institute Vietnam National University, Hanoi {cuongh.mi10, cuongla, sonpb}@vnu.edu.vn ABSTRACT

Under word-based alignment, frequent words with

consis-tent translations can be aligned at a high rate of precision

However, the words that are less frequent or exhibit diverse

translations in training corpora generally do not have

sta-tistically significant evidences for confident alignments [7]

In this work, we will focus on proposing a bootstrapping

algorithm to capture those less frequent or exhibit diverse

alignments Interestingly, we avoid making any explicit

as-sumption concerning with the pair of languages used As

the result, we take the experimental evaluations on two

phrase-based translation systems: the English-Vietnamese

and English-French translation systems Experiments point

out a significant “boosting” capacity for the quality in overall

for both these tasks

1 INTRODUCTION

Statistical Machine Translation (SMT) is a machine

trans-lation approach that sentence transtrans-lations are generated based

on statistical models whose parameters are derived from the

analysis of parallel sentence pairs in a bilingual corpus In

SMT, the best performing systems are based in some way on

phrases (or the groups of words) The basic idea of

phrase-based translation is to learn to break given source sentence

into phrases, then translate each phrase and finally compose

target sentence from these phrase translations [9, 12]

For a statistical phrase-based translation system, the

ac-curacy of statistical word-based alignment models is heavily

important In fact, under lexical alignment models (IBM

Models 1-2), the frequent words with a consistent

transla-tion usually can be aligned at a high rate of precision

How-ever, for the words that are less frequent or exhibit diverse

translations, in general we do not have statistically

signifi-cant evidence for a confident alignment

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior specific

permission and/or a fee.

SoICT 2012, August 23-24, 2012, Ha Long, Vietnam.

This problem tends to deeply reduce the translation qual-ity in some important manners First, the diverse transla-tions of a source word are never be able to recognize by our original statistical alignment models This is a quite impor-tant aspect It points out the sense that a pure statistical IBM alignment model is not sufficient enough to reach the state-of-the-art of the alignment quality Second, the bad quality in lexical translating estimation leads a bad influence

to the quality of using higher fertility-based alignment mod-els [6] Finally, the phrase extraction could not be able to generate more diverse translations for each word or phrase

In our observation, the essence of linguistics is flexible and diversity Therefore, we need to capture those diverse translations as serving for obtaining a more superior in qual-ity To overcome that problem, some papers report improve-ments when linguistics knowledge is used [7] In general, the linguistics knowledge is mainly served to filter out incorrect alignments As the result, it is impossible to be easily ap-plicable for all the language pairs without any adaptation Different to the previous methods, in this work we propose

a bootstrapping word alignment algorithm for improving the modelling of lexical translation Basically, we found that our alignment model are better in “capturing” the diverse trans-lations of words and therefore reducing the bad alignments

of rare words by that way Following to the work from [6], we also show out a very interesting point That is, although we mainly focus on IBM Models 1-2, we found that it is pos-sible to significantly improve the quality of fertility-based alignment models better

Consecutively, with the improving of word-based align-ment models, we point out that our phrase-based SMT sys-tem gains a statistically significant in improving its trans-lation quality The evaluation of our work is performed on different tasks of different languages Without concerning

to linguistics knowledge, we believe our approach will be applicable to other language pairs

The rest of this paper is organized as follows: Section II presents IBM Model 1-2 Section III denotes the problem of bad alignment for rare words or for the words with diverse translations Section IV focuses on our bootstrapping word alignment algorithm Section V presents our experimental evaluations Finally, conclusion is derived in section VI

2 IBM MODELS 1-2

Trang 2

Model 1 is a probabilistic generative model within a

frame-work that assumes a source sentence f1J of length J

trans-lates as a target sentence eI of length I It is defined as a

particularly simple instance of this framework, by assuming

all possible lengths for f1J (less than some arbitrary upper

bound) have a uniform probability Let t(fj|ei) as the

translation probability of fjgiven ei The alignment is

de-termined by specifying the values of ajfor j from 1 to J [1]

yields as the follows:

P r(f |e) =

(I + 1)J

J

Y

j=1

I

X

i=0

t(fj|ei) (1)

The parameters of Model 1 for a given pair of languages

are normally estimated using EM [3] We call the expected

number of times that word ei connects to fj in the pair of

translation (fJ

1|eI) the count of fj given ei for (fj|ei) and

denote it by c(fj|ei; f1J, eI) Following some mathematical

inductions by [1], c(fj|ei; f1J, eI) could be calculated as

fol-lows:

c(fj|ei; f1J, eI) = t(fj|ei)

PI i=0t(fj|ei)·

J

X

j=1

σ(f1J, fj)

·

I

X

i=0

In addition, we set λe as normalization factor and then

find repeatedly the translating probability between a word

fj in fJ

1 given a word eiin eI as:

t(fj|ei) = λ−1e c(fj|ei; f1J, eI) (3)

IBM Model 2 is just another simple model which is better

than Model 1 due to the fact that it addresses the issue of

alignment with an explicit model for alignment based on the

positions of the input and output words We make the same

assumptions as in Model 1 except we assume P r(aj|aj−11 , f1j−1, J, e)

depends on j, aj, and J , as well as on I The equation for

es-timating the probability of a target sentence, given a source

sentence:

P r(f |e) =

J

Y

j=1

I

X

i=0

t(fj|ei)a(i|j, J, I) (4)

3 A CAUSE FOR BAD WORD ALIGNMENT

TRANSLATION

Without the loss of generality, we assume that there exists

a set Fei contains N possible word translations of a word

ei:

Fei = {f1, f2, , fn}

It means that from parallel corpus, the correct alignments

of the elements from the set Fei are the lexical pairs (fj: ei)

(j ∈ 1, 2, , n) For each training iteration of word-based

alignment model, the λei normalization factor is defined as

the sum of all the lexical translation probabilities between

fj and ei:

λei =

n

X

j=1

We consider a foreign word, for example: fj We dive into the case, where fjusually co-occurs with a word ekfrom the pairs of bilingual sentences from parallel corpus We take

an assumption that this pair is not a pair of corresponding lexical translation in linguistics In our expectation, we hope the lexical translation probability t(fj|ek) is always smaller than the lexical translation probability t(fj|ei) Similar to the λei normalization factor, we have the λeknormalization factor of the word ek:

λek=

m

X

j=1

Unfortunately, if the word ekappears less than the word ei

(ek ei) in the training corpus, the λeinormalization factor

is usually greater deeply than the value λek normalization factor of the word ek(λek λei) Therefore, following from the equation (3), the lexical translation probability t(fj|ek)

is deeply greater over than the lexical translation probabil-ity t(fj|ei) Therefore, we cannot “capture” the option for choosing the correct alignment pair (fj|ei) as we expected Similarly, we assume that the word fj is a diverse trans-lation of the word ei Because ei contains a larger N pos-sible translation options, the λei gains a greater value, too Hence, following from the equation (3), the lexical transla-tion probability t(fj|ei) gains a very small value In this case, even when ek is not a rare word, it is very common that t(fj|ei) t(fj|ek) Therefore, we cannot control the diverse translation (fj; ei) as we expected

In our observations, we found that these things happen too usually Hence, it quite impacts the quality of word-by-word alignment modelling In the following, we will take

an example for showing this problem clearly From a paral-lel corpus that contains 60, 000 paralparal-lel sentences (English-Vietnamese), Table 1 and Table 2 are the results which de-rived from the “Viterbi” alignments which were pegged by training IBM Model 2

In this section, we will propose an index, which is entitled the Average Number of Best Alignment Importantly,

it is the key of this work Previously, [1] introduce the idea

of an alignment between a pair of strings as an object indi-cating for each word in the French string that word in the English string from which it arose For each specific trans-lation model, with a pair of parallel sentences, we will find the ”best” corresponding words of all the words of source sentence Our focus is to find out the correlation between the occurrence of a word and its probability to be chosen

as the best alignment pair (that word with its correspond-ing “marked” alignment word) by our statistical alignment model

In other words, we try to find out the relationship between the number of occurrence of words and the possibility that

it could be “pegged” as the best alignment pair These best alignment pairs could be usually not accurate Hence, our focus tries to reflect the possible error happened For con-venience, we call the Average Number of Best Alignment (ANBA) index as the average unit between the total num-ber of the times of a group of target words when they were marked as the best word-by-word alignments A group of target words here means that these words have the same frequency occurrence (Freq column)

Hence, ANBA could be defined as the average ratio be-tween the total number of the times of a group of target

Trang 3

words when they were marked as the best word-by-word

alignments when we find the best possible corresponding

word of a source word A class of words, which contains

all the words which have the same frequency occurrences

in training data Hence, the ANBA index of a class could

be calculated as the ratio between the number of times the

words belonged to its class were chosen as the best

word-by-word corresponding over the number of all the words of

that class

ANBA is also used to reflect the average number of

pos-sible translations of a group of target words Consider the

ANBA tables for both English and Vietnamese words when

each side was chosen as the target language in translation

σ = 2.28 Table 1: The ANBA Statistical Table for

Viet-namese Words

σ = 5.46 Table 2: The ANBA Statistical Table for English

Words

Table 1 and 2 denote clearly the fact that we can only be

able to “capture” the translations of a word when that word

appears not many times (the diverse translation problem)

In addition, the more one word appears, the more difficulties

for capturing its diverse target translations To besides, we

see that the average “ANBA” index strongly goes far away

corresponding to the smaller frequency of a word When one

word appears rarely than the others in training data, it is

usually chosen as the best “Viterbi” alignment pair of source

word as a target word

4 IMPROVING LEXICAL TRANSLATION

MODEL

The problem of rare words or words which have a lot of

diverse translations takes us an interesting challenge

For-tunately, IBM Models 1-2 are simple models in sense that

we could implement the training processes of them very fast when we compare to the complexity for training higher IBM models Therefore, our improving idea will focus on improv-ing the trainimprov-ing scheme of for gainimprov-ing a better result

4.1 Improving Lexical Translation Model

Turn back to the set Fei which contains n possible lex-ical translations of the word ei case, we assume that the pair (fn; ei) appears many times in training corpus, from the equations (2) and (3), t(fn|ei) will be obtained a very high value Consecutively, the rest lexical translation proba-bilities t(fj|ei) with i ∈ {1 (n − 1)} will be obtained some very small values It becomes badly when the cardinality

of the set F of a word is large, as expected as the diversity property of linguistics

Clearly, t(fj|ei) be never gained it “satisfied” translation probability according to them Therefore, the noisy align-ment choosing happens strongly followed to the fact that the pairs of not kinds of real lexical pair (in linguistics) eas-ily gain higher lexical translation probabilities To over-come our noisy problem, in the following, we will present a bootstrapping alignment algorithm for refining the training scheme for IBM Models 1-2

In more details, we divide the training scheme of a trans-lation model into N steps, which N is called the smoothing bootstrapping factor In other words, N could be also de-fined as the number of times we re-train our IBM Models Without the loss of generality, we assume that the occur-rence happening of the word fj is not usually than the oc-currence happening of the word fj+1

For more convenience, we denoted this assumption as: c(f1) ≤ c(f2) ≤ c(fn) If we could divide our training work into N consecutive steps, which each of them tries to separate fnin the case that the lexical translation probabil-ity t(fn|ei) is the maximum translation probability at that time when we compare to other target words ek

Hence, if we mark and filter out every “marked” pairs of

fn and ei in bilingual corpus when they are chosen as the best lexical alignment at that time, we have new training corpus with a set of new updated “parallel sentences” If

we re-train our new training parallel corpus again, for each training iteration, we have a new normalization factor ˜λei

of the word ei:

˜

λei =

n−1

X

j=1

It is very interesting that each time when we separate fn

out of the set F , it is obviously that the new lexical transla-tion probability ˜t(fk|ei) (k is different to n) will be increased

as it is expected to be increased because the decreasing value

of ˜λeiwhen we compare to the original normalization factor

λei That result comes from the fact that there is no need for adding a value of c(fn|ei) to ˜λei any more Therefore, the possibility of the source lexical word fjwill be automat-ically aligned to ei is increased as we expected, too This is the main key for us to be able to capture the other “diverse” translations of a word

4.2 The Bootstrapping Word Alignment Al-gorithm

From the above analysis, we have a simple but very

Trang 4

effec-tive way to improve the quality of lexical translation

mod-elling In this section, we will formalize our bootstrapping

word alignment algorithm Put the set:

N = {∆1, ∆2, , ∆n}(∆1< ∆2< · · · < ∆n)

as the set which could be defined as the covering range of

our bootstrapping word alignment algorithm The ∆ivalue

could be understood as the occurrence threshold (or

fre-quency threshold) for separating a group of words according

to their group’s level of the number of occurrences

The bootstrapping word alignment algorithm is formally

described as follows:

The Bootstrapping Word Alignment Algorithm

Input: eS1, f1S

N = {∆1, ∆2, , ∆n}(∆1< ∆2< · · · < ∆n)

Output: Alignment A

1 Start with a = φ

2 For each ∆n as a threshold:

Count frequency of each word in eS

1 Train IBM Model

For each pair (f(s), e(s)), 1 ≤ s ≤ S

For each eiin e(s))

If eimarked in A

continue

If fj in f(s)marked in A

continue

Finding best fj in f(s)

If c(fi) ≥ ∆n

Mark (fj; ei) as a pair of word Add (fj; ei) to A

Changing eiin e(s)

n = n - 1

Goto 2

For the word fj which is the best alignment of a source

word ei and the number of the occurrences of fj is over a

threshold condition, we mark and add (fj; ei) to the set of

alignment A After that, we need to mark ei as “pegged”

Similar to [1], we change the word ei by adding a prefix

“UNK” to it The changing word ei as noticed previously,

serving for “boosting” the probability for obtaining other

possible translation fkof eiin other sentences (by reducing

the adding of the translation probability t(fj|ei) value to the

total λei value since we change eiin e(s)by adding prefix to

change)

In fact, the training of lexical translation models does not

cost deeply our computing power However, to re-train

sys-tem by n times, which n is the cardinality of the set N ,

actually costs deeply the computational resource We have

an upgraded version of our bootstrapping word alignment

algorithm That is after pegging, for example, (fj; ei) as the

best alignment which satisfies the count frequency condition,

we let them out of our training data and re-train the system

with corresponding to our new threshold ∆i This reduces

the computational resource deeply and also reduces deeply

time for processing

Together, we will improve the way of choosing each

ele-ment ∆iin the set N for helping us to not only cover a larger

range of ∆n, but also reduce the computational resource In

more details, these improving schemes will be described in the experimental section

5 EXPERIMENT

Recently researches point out that it is difficult to achieve heavily gains in translation performance based on improv-ing word-based alignment results Word alignment quality [4] improving could be strong but it is hard to improve the quality statistical phrase-based translation system in over-all [2, 14] Therefore, to confirm the influences of the im-provements from lexical translation, we will test directly the impact of using word alignment extracting component for

”learning” phrases in phrase-based SMT system

This experiment is deployed on English-Vietnamese lan-guages It is also deployed on English-French languages for a larger training data The English-Vietnamese training data was credited by [5] The English-French training corpus was Hansards corpus [10] We use MOSES framework [8] as the phrase-based SMT framework In all evaluations, we trans-late sentences from Vietnamese to English Then, we mea-sure performance using BLEU metric [13], which estimates the accuracy of translation output with respect to a refer-ence translation We use 1, 000 pair of parallel sentrefer-ences for testing the translation quality of statistical phrase-based translation system

5.1 Baseline Results

According to this work, we use LGIZA1 as a lightweight statistical machine translation toolkit that is used to train IBM Models 1-3 More information about LGIZA could be referred from [6] Different to GIZA++, LGIZA is originally implemented based on the original IBM Models documen-tary [1] without applying other latter improved techniques which are integrated to GIZA++ These are determining word classes for giving a low translation lexicon perplex-ity (Och, 1999), various smoothing techniques for fertilperplex-ity, distortion or alignment parameters, symmetrization [10][11], etc which are applied in GIZA++ [11] and therefore, the applying of your improved techniques could make our results

a little bit noise in comparison

Table 1 presents the testing results with BLEU score mea-surement with each specific IBM Model which was trained

on a bilingual corpus with 60, 000 parallel sentences (0.65 million of tokens) There is a little bit noticed here - that

is for English and Vietnamese - as an example of a pair of quite different in linguistics, the comparison results of using IBM Model 3 is usually not good as IBM Model 2

IBM Model 1 19.07 IBM Model 2 19.54 IBM Model 3 18.70 Table 3: Using IBM Models as the baseline

5.2 Evaluation On The Bootstrapping Word Alignment Algorithm

These followings experimental evaluations aim to focus

on seeing the impact by applying our bootstrapping word alignment algorithm For each evaluation, for the same set

1

LGIZA is available on: http://code.google.com/p/lgiza/

Trang 5

N served as the covering range, we take different smoothing

factors as the difference between two thresholds ∆iand ∆i+1

(we assume it is a constant) Table 4-5-6-7 point out clearly

the results when we apply our bootstrapping word alignment

algorithm for each specific smoothing factor and for each set

N

Table 4: Improving results when setting the

smooth-ing factor value is 1

Clearly, the lesson from our evaluation is that it could

be gained better quality of statistical machine translation

using our training smoothing scheme Usually, the more

refined smoothing factor (smaller value) together with the

large range we apply the smoothing (N ) help us to obtain

better the translation quality In fact, sometimes when we

choose the size of the set N is too large, it is not sure that

we will obtain a better result (for example - 160 vs 120)

This comes from the fact that for the words which

ap-pear more times, the probability that it was “pegged” by a

wrong alignment pair is smaller than the probability of a

word which appears less times Therefore, we do not need

to use a constant smoothing factor for the same treating to

different occurrence level of words

5.3 Evaluation On The Upgraded Version

The above experimental evaluations on our bootstrapping

word alignment algorithm clearly helps us to improve better

translation quality of IBM Model 1 In almost cases, we also

be able to boost the quality of translation models around

0.5% BLEU Score However, if we re-train the quality too

much time, it took too much cost Similarly, if we use a

constant smoothing factor for the same treating to different

occurrence level of words, it is not good as we described

above

In this section, we will improve our bootstrapping word

alignment algorithm by improving the way of choosing the

Table 7: Improving results when setting the smooth-ing factor value is 8

smooth factors That is we do not choose the same smooth-ing factor for all each bootstrappsmooth-ing iterations Alterna-tively, we take the smoothing factors between each ∆i and

∆i+1as a consecutively sequence:

S : {0, 1, 2, 3, 4, 5, , n − 1}

and we get the corresponding set N :

N : {0, 1, 3, 6, 10, 15, , ∆n}

This comes from the fact as well-described by the English and Vietnamese words statistical Tables We could see it clearly that for the words which have smaller frequencies, the more word is rarely happens, the more over-fitting when

we test its ANBA index Otherwise, it is slightly better for these other words which are happened more often

We integrate our improving in choosing the set N corre-sponding to the set S to our upgraded version of the boot-strapping word alignment algorithm to have a better algo-rithm We found that our improving algorithm could also improving the similar results In more details, Table 8 and

9 presents how Model 1 and Model 2 could be improved (de-noted by the increasing BLEU score measurement) by using our refining scheme

In fact, with our upgraded version, the improving in choos-ing the set N helps us not only need much less computational power but also cover a larger range of the classes of the oc-currences of words In practice, we found that we only need around 5 times for running the bootstrapping word align-ment algorithm for IBM Models 1-2 when we compare to apply our original implementation

Size of S BLEU Score ∆

Table 8: Evaluation on the refining scheme - Im-proving IBM Model 1

Table 9: Evaluation on the refining scheme - Im-proving IBM Model 2

Trang 6

5.4 Evaluation On Improving Fertility Model

In this section, we will go to one of the most interesting

thing which we want to emphasize We will present the

boosting translation quality to fertility models based on the

improving from lexical translation modelling Actually, the

improving 0.5% BLEU score measurement is not too much

important The most important thing is that by improving

lexical translation modelling, higher fertility models could

boost their translation quality around 1% in BLEU score

when we compare to the original implementation

To reason the boosting capacity of using higher

transla-tion models, at first we consider new obtaining results about

the words statistical tables when we apply our bootstrapping

word alignment algorithm These “ANBA” index tables

de-rived from using the test with N with it cardinality is 14 in

the evaluation in the Table 9 We could see it clearly that the

ANBA index is reduced quite divergence strongly as we let

them out previously when we compare to the results which

we train by our original method

σ = 0.73 Table 10: The Upgraded ANBA Statistical Table for

Vietnamese Words

English Words

Thinking it likes we have a better balance in ANBA

in-dices This will let us have a better initial translation

pa-rameters using for the initial papa-rameters for training

fertil-ity models In addition to the better in the initial fertilfertil-ity

transferring parameters, more important, there is no “trick”

for helping us to train IBM Model 3 in a very fast way

which could “care” all the possible alignments for each pair

of parallel sentences in our training data for finding the best

“Viterbi” alignments Our strategy is to carry out the sums

of translations only over some of the more probable align-ments, ignoring the vast sea of much less probable ones Since we begin with the most probable alignment that we can find and then include all alignments that can be ob-tained from it by small changes, the improving results in lexical modelling is very important

Based on the original equation by [9], following to our bet-ter “Vibet-terbi” alignments from IBM Model 2, we will have a new translation probabilities and position alignment proba-bilities derived from our “Viterbi” alignment sequences ob-taining from applying the bootstrapping word alignment al-gorithm for training IBM Models 1-2:

t(f |e) =Pcount(f, e)

a(i|j, le, lf ) = Pcount(i, j, le, lf )

icount(i, j, le, lf ) (9) Hence, we have new initial translation probabilities and new initial alignment probabilities By using them as the initial parameter transferring to the training IBM Model

3, we will show that we could obtain a better translation probabilities, as described in the Table 12:

Table 12: Improving IBM Model 3

It is clearly from the above evaluation results that by im-proving IBM Model 1 and 2 translation scheme, we could obtain a better SMT system with increasing BLEU Score around 1% measurement Another interesting thing is, in almost case, we always gain a better results, which are al-most around 1% BLEU score improving measurement

To besides, the more appealing thing is, we could see that

by applying our improving, we do not change deeply the monotonic of the ANBA index It means that we slightly change them but the improving result is quite impressed It makes us a strong believe that in the future, by applying other better improving translation scheme, and together, changing more deeply the monotonic of the ANBA index,

we will be able to obtain a more strong impress better SMT system quality

5.5 Evaluation On Larger Training Data

To see our improving could be able to apply to other pairs

of languages, in this section, we will deploy our improving training scheme in the pair of English-French language This also helps us for testing the influences in larger training data

We apply the improved bootstrapping algorithm on our training data which contains 100, 000 English - French par-allel sentences (4 million of tokens) The original ANBA indices Tables when we running the original IBM Model 1-2 (the BLEU Score is 24.01) is denoted in Table 13 and 14

We choose the size of the set S is 14 After applying our Bootstrapping Word Alignment Algorithm, we have a better SMT system with BLEU Score is 25.17, and a better ANBA

Trang 7

Freq Num of Words ANBA Deviation

σ = 4.30 Table 13: The ANBA Statistical Table for French

Words

σ = 5.55 Table 14: The ANBA Statistical Table for English

Words

indices, as denoted in Table 15 and 16 Finally, the boosting

of fertility models denoted in Table 17

French Words

The original BLEU score measurement of IBM Model 3 for

this training data is 24.33 Our improving got a BLEU score

26.05 and it’s a significantly better result compared to the

original result It seems like our bootstrapping algorithm is

improved the quality of statistical phrase-based translation

system better for the larger training data

6 CONCLUSION AND FUTURE WORK

Under a word-based approach, frequent words with a

con-sistent translation can be aligned at a high rate of

σ = 1.57 Table 16: The Upgraded ANBA Statistical Table for English Words

Table 17: Improving IBM Model 3

sion However, words that are less frequent or exhibit diverse translations do not have statistically significant evidence for confident alignment, thereby leading to incomplete or in-correct alignments We have already presented this aspect based on the proposed index ANBA We have also pointed out the fact that using a pure statistical IBM translation model is not enough to reach the state-of-the-art in word alignment modelling

To overcome this problem, we present an effective boot-strapping word alignment algorithm In addition, we found that there are also a lot of effected methods which are quite simple and easy for boosting the quality of IBM Models 1-2 and hence improving the machine translation quality in over-all Following to this scheme and other methods, we believe that these improvements will improve the fertility models

to be obtained the state-of-the-art of statistical alignment quality

7 ACKNOWLEDGEMENT

This work is partially supported by the Vietnam’s Na-tional Foundation for Science and Technology Development (NAFOSTED), project code 102.99.35.09 This work is also partially supported by the project KC.01.TN04/11-15

8 REFERENCES

[1] P F Brown, V J D Pietra, S A D Pietra, and

R L Mercer The mathematics of statistical machine translation: parameter estimation Comput Linguist., 19:263–311, June 1993

[2] C Callison-Burch, D Talbot, and M Osborne Statistical machine translation with word- and sentence-aligned parallel corpora In Proceedings of the 42nd Annual Meeting on Association for

Computational Linguistics, ACL ’04, Stroudsburg, PA, USA, 2004 Association for Computational Linguistics [3] A P Dempster, N M Laird, and D B Rubin Maximum likelihood from incomplete data via the em algorithm JOURNAL OF THE ROYAL

Trang 8

STATISTICAL SOCIETY, SERIES B, 39(1):1–38,

1977

[4] A Fraser and D Marcu Measuring word alignment

quality for statistical machine translation Comput

Linguist., 33:293–303, Sept 2007

[5] C Hoang, A Le, P Nguyen, and T Ho Exploiting

non-parallel corpora for statistical machine

translation In Proceedings of The 9th IEEE-RIVF

International Conference on Computing and

Communication Technologies, pages 97 – 102 IEEE

Computer Society, 2012

[6] C Hoang, A Le, and B Pham A systematic

comparison of various statistical alignment models for

statistical english-vietnamese phrase-based translation

(to appear) In Proceedings of The 4th International

Conference on Knowledge and Systems Engineering

IEEE Computer Society, 2012

[7] S J Ker and J S Chang A class-based approach to

word alignment Comput Linguist., 23:313–343, June

1997

[8] P Koehn, H Hoang, A Birch, C Callison-Burch,

M Federico, N Bertoldi, B Cowan, W Shen,

C Moran, R Zens, C Dyer, O Bojar, A Constantin,

and E Herbst Moses: open source toolkit for

statistical machine translation In Proceedings of the

45th Annual Meeting of the ACL on Interactive Poster

and Demonstration Sessions, ACL ’07, pages 177–180,

Stroudsburg, PA, USA, 2007 Association for

Computational Linguistics

[9] P Koehn, F J Och, and D Marcu Statistical

phrase-based translation In Proceedings of the 2003

Conference of the North American Chapter of the

Association for Computational Linguistics on Human

Language Technology - Volume 1, NAACL ’03, pages

48–54, Stroudsburg, PA, USA, 2003 Association for

Computational Linguistics

[10] F J Och and H Ney Improved statistical alignment

models In Proceedings of the 38th Annual Meeting on

Association for Computational Linguistics, ACL ’00,

pages 440–447, Stroudsburg, PA, USA, 2000

Association for Computational Linguistics

[11] F J Och and H Ney A systematic comparison of

various statistical alignment models Comput

Linguist., 29:19–51, March 2003

[12] F J Och and H Ney The alignment template

approach to statistical machine translation Comput

Linguist., 30:19–51, June 2004

[13] K Papineni, S Roukos, T Ward, and W.-J Zhu

Bleu: a method for automatic evaluation of machine

translation In Proceedings of the 40th Annual Meeting

on Association for Computational Linguistics, ACL

’02, pages 311–318, Stroudsburg, PA, USA, 2002

Association for Computational Linguistics

[14] D Vilar, M Popovi´c, and H Ney AER: Do we need

to ”improve” our alignments? In International

Workshop on Spoken Language Translation, pages

205–212, Kyoto, Japan, Nov 2006

Định dạng
Số trang	8
Dung lượng	705,75 KB