Refining Lexical Translation Training Scheme forImproving The Quality of Statistical Phrase-Based Translation Cuong Hoang1, Cuong Anh Le1, Son Bao Pham1,2 1 Faculty of Information Techno
Trang 1Refining Lexical Translation Training Scheme for
Improving The Quality of Statistical Phrase-Based
Translation
Cuong Hoang1, Cuong Anh Le1, Son Bao Pham1,2
1 Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi
2 Information Technology Institute Vietnam National University, Hanoi {cuongh.mi10, cuongla, sonpb}@vnu.edu.vn ABSTRACT
Under word-based alignment, frequent words with
consis-tent translations can be aligned at a high rate of precision
However, the words that are less frequent or exhibit diverse
translations in training corpora generally do not have
sta-tistically significant evidences for confident alignments [7]
In this work, we will focus on proposing a bootstrapping
algorithm to capture those less frequent or exhibit diverse
alignments Interestingly, we avoid making any explicit
as-sumption concerning with the pair of languages used As
the result, we take the experimental evaluations on two
phrase-based translation systems: the English-Vietnamese
and English-French translation systems Experiments point
out a significant “boosting” capacity for the quality in overall
for both these tasks
1 INTRODUCTION
Statistical Machine Translation (SMT) is a machine
trans-lation approach that sentence transtrans-lations are generated based
on statistical models whose parameters are derived from the
analysis of parallel sentence pairs in a bilingual corpus In
SMT, the best performing systems are based in some way on
phrases (or the groups of words) The basic idea of
phrase-based translation is to learn to break given source sentence
into phrases, then translate each phrase and finally compose
target sentence from these phrase translations [9, 12]
For a statistical phrase-based translation system, the
ac-curacy of statistical word-based alignment models is heavily
important In fact, under lexical alignment models (IBM
Models 1-2), the frequent words with a consistent
transla-tion usually can be aligned at a high rate of precision
How-ever, for the words that are less frequent or exhibit diverse
translations, in general we do not have statistically
signifi-cant evidence for a confident alignment
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SoICT 2012, August 23-24, 2012, Ha Long, Vietnam.
Copyright 2012 ACM 978-1-4503-1232-5/12/08 $10.00.
This problem tends to deeply reduce the translation qual-ity in some important manners First, the diverse transla-tions of a source word are never be able to recognize by our original statistical alignment models This is a quite impor-tant aspect It points out the sense that a pure statistical IBM alignment model is not sufficient enough to reach the state-of-the-art of the alignment quality Second, the bad quality in lexical translating estimation leads a bad influence
to the quality of using higher fertility-based alignment mod-els [6] Finally, the phrase extraction could not be able to generate more diverse translations for each word or phrase
In our observation, the essence of linguistics is flexible and diversity Therefore, we need to capture those diverse translations as serving for obtaining a more superior in qual-ity To overcome that problem, some papers report improve-ments when linguistics knowledge is used [7] In general, the linguistics knowledge is mainly served to filter out incorrect alignments As the result, it is impossible to be easily ap-plicable for all the language pairs without any adaptation Different to the previous methods, in this work we propose
a bootstrapping word alignment algorithm for improving the modelling of lexical translation Basically, we found that our alignment model are better in “capturing” the diverse trans-lations of words and therefore reducing the bad alignments
of rare words by that way Following to the work from [6], we also show out a very interesting point That is, although we mainly focus on IBM Models 1-2, we found that it is pos-sible to significantly improve the quality of fertility-based alignment models better
Consecutively, with the improving of word-based align-ment models, we point out that our phrase-based SMT sys-tem gains a statistically significant in improving its trans-lation quality The evaluation of our work is performed on different tasks of different languages Without concerning
to linguistics knowledge, we believe our approach will be applicable to other language pairs
The rest of this paper is organized as follows: Section II presents IBM Model 1-2 Section III denotes the problem of bad alignment for rare words or for the words with diverse translations Section IV focuses on our bootstrapping word alignment algorithm Section V presents our experimental evaluations Finally, conclusion is derived in section VI
2 IBM MODELS 1-2
Trang 2Model 1 is a probabilistic generative model within a
frame-work that assumes a source sentence f1J of length J
trans-lates as a target sentence eI of length I It is defined as a
particularly simple instance of this framework, by assuming
all possible lengths for f1J (less than some arbitrary upper
bound) have a uniform probability Let t(fj|ei) as the
translation probability of fjgiven ei The alignment is
de-termined by specifying the values of ajfor j from 1 to J [1]
yields as the follows:
P r(f |e) =
(I + 1)J
J
Y
j=1
I
X
i=0
t(fj|ei) (1)
The parameters of Model 1 for a given pair of languages
are normally estimated using EM [3] We call the expected
number of times that word ei connects to fj in the pair of
translation (fJ
1|eI) the count of fj given ei for (fj|ei) and
denote it by c(fj|ei; f1J, eI) Following some mathematical
inductions by [1], c(fj|ei; f1J, eI) could be calculated as
fol-lows:
c(fj|ei; f1J, eI) = t(fj|ei)
PI i=0t(fj|ei)·
J
X
j=1
σ(f1J, fj)
·
I
X
i=0
In addition, we set λe as normalization factor and then
find repeatedly the translating probability between a word
fj in fJ
1 given a word eiin eI as:
t(fj|ei) = λ−1e c(fj|ei; f1J, eI) (3)
IBM Model 2 is just another simple model which is better
than Model 1 due to the fact that it addresses the issue of
alignment with an explicit model for alignment based on the
positions of the input and output words We make the same
assumptions as in Model 1 except we assume P r(aj|aj−11 , f1j−1, J, e)
depends on j, aj, and J , as well as on I The equation for
es-timating the probability of a target sentence, given a source
sentence:
P r(f |e) =
J
Y
j=1
I
X
i=0
t(fj|ei)a(i|j, J, I) (4)
3 A CAUSE FOR BAD WORD ALIGNMENT
TRANSLATION
Without the loss of generality, we assume that there exists
a set Fei contains N possible word translations of a word
ei:
Fei = {f1, f2, , fn}
It means that from parallel corpus, the correct alignments
of the elements from the set Fei are the lexical pairs (fj: ei)
(j ∈ 1, 2, , n) For each training iteration of word-based
alignment model, the λei normalization factor is defined as
the sum of all the lexical translation probabilities between
fj and ei:
λei =
n
X
j=1
We consider a foreign word, for example: fj We dive into the case, where fjusually co-occurs with a word ekfrom the pairs of bilingual sentences from parallel corpus We take
an assumption that this pair is not a pair of corresponding lexical translation in linguistics In our expectation, we hope the lexical translation probability t(fj|ek) is always smaller than the lexical translation probability t(fj|ei) Similar to the λei normalization factor, we have the λeknormalization factor of the word ek:
λek=
m
X
j=1
Unfortunately, if the word ekappears less than the word ei
(ek ei) in the training corpus, the λeinormalization factor
is usually greater deeply than the value λek normalization factor of the word ek(λek λei) Therefore, following from the equation (3), the lexical translation probability t(fj|ek)
is deeply greater over than the lexical translation probabil-ity t(fj|ei) Therefore, we cannot “capture” the option for choosing the correct alignment pair (fj|ei) as we expected Similarly, we assume that the word fj is a diverse trans-lation of the word ei Because ei contains a larger N pos-sible translation options, the λei gains a greater value, too Hence, following from the equation (3), the lexical transla-tion probability t(fj|ei) gains a very small value In this case, even when ek is not a rare word, it is very common that t(fj|ei) t(fj|ek) Therefore, we cannot control the diverse translation (fj; ei) as we expected
In our observations, we found that these things happen too usually Hence, it quite impacts the quality of word-by-word alignment modelling In the following, we will take
an example for showing this problem clearly From a paral-lel corpus that contains 60, 000 paralparal-lel sentences (English-Vietnamese), Table 1 and Table 2 are the results which de-rived from the “Viterbi” alignments which were pegged by training IBM Model 2
In this section, we will propose an index, which is entitled the Average Number of Best Alignment Importantly,
it is the key of this work Previously, [1] introduce the idea
of an alignment between a pair of strings as an object indi-cating for each word in the French string that word in the English string from which it arose For each specific trans-lation model, with a pair of parallel sentences, we will find the ”best” corresponding words of all the words of source sentence Our focus is to find out the correlation between the occurrence of a word and its probability to be chosen
as the best alignment pair (that word with its correspond-ing “marked” alignment word) by our statistical alignment model
In other words, we try to find out the relationship between the number of occurrence of words and the possibility that
it could be “pegged” as the best alignment pair These best alignment pairs could be usually not accurate Hence, our focus tries to reflect the possible error happened For con-venience, we call the Average Number of Best Alignment (ANBA) index as the average unit between the total num-ber of the times of a group of target words when they were marked as the best word-by-word alignments A group of target words here means that these words have the same frequency occurrence (Freq column)
Hence, ANBA could be defined as the average ratio be-tween the total number of the times of a group of target
Trang 3words when they were marked as the best word-by-word
alignments when we find the best possible corresponding
word of a source word A class of words, which contains
all the words which have the same frequency occurrences
in training data Hence, the ANBA index of a class could
be calculated as the ratio between the number of times the
words belonged to its class were chosen as the best
word-by-word corresponding over the number of all the words of
that class
ANBA is also used to reflect the average number of
pos-sible translations of a group of target words Consider the
ANBA tables for both English and Vietnamese words when
each side was chosen as the target language in translation
σ = 2.28 Table 1: The ANBA Statistical Table for
Viet-namese Words
σ = 5.46 Table 2: The ANBA Statistical Table for English
Words
Table 1 and 2 denote clearly the fact that we can only be
able to “capture” the translations of a word when that word
appears not many times (the diverse translation problem)
In addition, the more one word appears, the more difficulties
for capturing its diverse target translations To besides, we
see that the average “ANBA” index strongly goes far away
corresponding to the smaller frequency of a word When one
word appears rarely than the others in training data, it is
usually chosen as the best “Viterbi” alignment pair of source
word as a target word
4 IMPROVING LEXICAL TRANSLATION
MODEL
The problem of rare words or words which have a lot of
diverse translations takes us an interesting challenge
For-tunately, IBM Models 1-2 are simple models in sense that
we could implement the training processes of them very fast when we compare to the complexity for training higher IBM models Therefore, our improving idea will focus on improv-ing the trainimprov-ing scheme of for gainimprov-ing a better result
4.1 Improving Lexical Translation Model
Turn back to the set Fei which contains n possible lex-ical translations of the word ei case, we assume that the pair (fn; ei) appears many times in training corpus, from the equations (2) and (3), t(fn|ei) will be obtained a very high value Consecutively, the rest lexical translation proba-bilities t(fj|ei) with i ∈ {1 (n − 1)} will be obtained some very small values It becomes badly when the cardinality
of the set F of a word is large, as expected as the diversity property of linguistics
Clearly, t(fj|ei) be never gained it “satisfied” translation probability according to them Therefore, the noisy align-ment choosing happens strongly followed to the fact that the pairs of not kinds of real lexical pair (in linguistics) eas-ily gain higher lexical translation probabilities To over-come our noisy problem, in the following, we will present a bootstrapping alignment algorithm for refining the training scheme for IBM Models 1-2
In more details, we divide the training scheme of a trans-lation model into N steps, which N is called the smoothing bootstrapping factor In other words, N could be also de-fined as the number of times we re-train our IBM Models Without the loss of generality, we assume that the occur-rence happening of the word fj is not usually than the oc-currence happening of the word fj+1
For more convenience, we denoted this assumption as: c(f1) ≤ c(f2) ≤ c(fn) If we could divide our training work into N consecutive steps, which each of them tries to separate fnin the case that the lexical translation probabil-ity t(fn|ei) is the maximum translation probability at that time when we compare to other target words ek
Hence, if we mark and filter out every “marked” pairs of
fn and ei in bilingual corpus when they are chosen as the best lexical alignment at that time, we have new training corpus with a set of new updated “parallel sentences” If
we re-train our new training parallel corpus again, for each training iteration, we have a new normalization factor ˜λei
of the word ei:
˜
λei =
n−1
X
j=1
It is very interesting that each time when we separate fn
out of the set F , it is obviously that the new lexical transla-tion probability ˜t(fk|ei) (k is different to n) will be increased
as it is expected to be increased because the decreasing value
of ˜λeiwhen we compare to the original normalization factor
λei That result comes from the fact that there is no need for adding a value of c(fn|ei) to ˜λei any more Therefore, the possibility of the source lexical word fjwill be automat-ically aligned to ei is increased as we expected, too This is the main key for us to be able to capture the other “diverse” translations of a word
4.2 The Bootstrapping Word Alignment Al-gorithm
From the above analysis, we have a simple but very
Trang 4effec-tive way to improve the quality of lexical translation
mod-elling In this section, we will formalize our bootstrapping
word alignment algorithm Put the set:
N = {∆1, ∆2, , ∆n}(∆1< ∆2< · · · < ∆n)
as the set which could be defined as the covering range of
our bootstrapping word alignment algorithm The ∆ivalue
could be understood as the occurrence threshold (or
fre-quency threshold) for separating a group of words according
to their group’s level of the number of occurrences
The bootstrapping word alignment algorithm is formally
described as follows:
The Bootstrapping Word Alignment Algorithm
Input: eS1, f1S
N = {∆1, ∆2, , ∆n}(∆1< ∆2< · · · < ∆n)
Output: Alignment A
1 Start with a = φ
2 For each ∆n as a threshold:
Count frequency of each word in eS
1 Train IBM Model
For each pair (f(s), e(s)), 1 ≤ s ≤ S
For each eiin e(s))
If eimarked in A
continue
If fj in f(s)marked in A
continue
Finding best fj in f(s)
If c(fi) ≥ ∆n
Mark (fj; ei) as a pair of word Add (fj; ei) to A
Changing eiin e(s)
n = n - 1
Goto 2
For the word fj which is the best alignment of a source
word ei and the number of the occurrences of fj is over a
threshold condition, we mark and add (fj; ei) to the set of
alignment A After that, we need to mark ei as “pegged”
Similar to [1], we change the word ei by adding a prefix
“UNK” to it The changing word ei as noticed previously,
serving for “boosting” the probability for obtaining other
possible translation fkof eiin other sentences (by reducing
the adding of the translation probability t(fj|ei) value to the
total λei value since we change eiin e(s)by adding prefix to
change)
In fact, the training of lexical translation models does not
cost deeply our computing power However, to re-train
sys-tem by n times, which n is the cardinality of the set N ,
actually costs deeply the computational resource We have
an upgraded version of our bootstrapping word alignment
algorithm That is after pegging, for example, (fj; ei) as the
best alignment which satisfies the count frequency condition,
we let them out of our training data and re-train the system
with corresponding to our new threshold ∆i This reduces
the computational resource deeply and also reduces deeply
time for processing
Together, we will improve the way of choosing each
ele-ment ∆iin the set N for helping us to not only cover a larger
range of ∆n, but also reduce the computational resource In
more details, these improving schemes will be described in the experimental section
5 EXPERIMENT
Recently researches point out that it is difficult to achieve heavily gains in translation performance based on improv-ing word-based alignment results Word alignment quality [4] improving could be strong but it is hard to improve the quality statistical phrase-based translation system in over-all [2, 14] Therefore, to confirm the influences of the im-provements from lexical translation, we will test directly the impact of using word alignment extracting component for
”learning” phrases in phrase-based SMT system
This experiment is deployed on English-Vietnamese lan-guages It is also deployed on English-French languages for a larger training data The English-Vietnamese training data was credited by [5] The English-French training corpus was Hansards corpus [10] We use MOSES framework [8] as the phrase-based SMT framework In all evaluations, we trans-late sentences from Vietnamese to English Then, we mea-sure performance using BLEU metric [13], which estimates the accuracy of translation output with respect to a refer-ence translation We use 1, 000 pair of parallel sentrefer-ences for testing the translation quality of statistical phrase-based translation system
5.1 Baseline Results
According to this work, we use LGIZA1 as a lightweight statistical machine translation toolkit that is used to train IBM Models 1-3 More information about LGIZA could be referred from [6] Different to GIZA++, LGIZA is originally implemented based on the original IBM Models documen-tary [1] without applying other latter improved techniques which are integrated to GIZA++ These are determining word classes for giving a low translation lexicon perplex-ity (Och, 1999), various smoothing techniques for fertilperplex-ity, distortion or alignment parameters, symmetrization [10][11], etc which are applied in GIZA++ [11] and therefore, the applying of your improved techniques could make our results
a little bit noise in comparison
Table 1 presents the testing results with BLEU score mea-surement with each specific IBM Model which was trained
on a bilingual corpus with 60, 000 parallel sentences (0.65 million of tokens) There is a little bit noticed here - that
is for English and Vietnamese - as an example of a pair of quite different in linguistics, the comparison results of using IBM Model 3 is usually not good as IBM Model 2
IBM Model 1 19.07 IBM Model 2 19.54 IBM Model 3 18.70 Table 3: Using IBM Models as the baseline
5.2 Evaluation On The Bootstrapping Word Alignment Algorithm
These followings experimental evaluations aim to focus
on seeing the impact by applying our bootstrapping word alignment algorithm For each evaluation, for the same set
1
LGIZA is available on: http://code.google.com/p/lgiza/
Trang 5N served as the covering range, we take different smoothing
factors as the difference between two thresholds ∆iand ∆i+1
(we assume it is a constant) Table 4-5-6-7 point out clearly
the results when we apply our bootstrapping word alignment
algorithm for each specific smoothing factor and for each set
N
Table 4: Improving results when setting the
smooth-ing factor value is 1
Table 5: Improving results when setting the
smooth-ing factor value is 2
Table 6: Improving results when setting the
smooth-ing factor value is 4
Clearly, the lesson from our evaluation is that it could
be gained better quality of statistical machine translation
using our training smoothing scheme Usually, the more
refined smoothing factor (smaller value) together with the
large range we apply the smoothing (N ) help us to obtain
better the translation quality In fact, sometimes when we
choose the size of the set N is too large, it is not sure that
we will obtain a better result (for example - 160 vs 120)
This comes from the fact that for the words which
ap-pear more times, the probability that it was “pegged” by a
wrong alignment pair is smaller than the probability of a
word which appears less times Therefore, we do not need
to use a constant smoothing factor for the same treating to
different occurrence level of words
5.3 Evaluation On The Upgraded Version
The above experimental evaluations on our bootstrapping
word alignment algorithm clearly helps us to improve better
translation quality of IBM Model 1 In almost cases, we also
be able to boost the quality of translation models around
0.5% BLEU Score However, if we re-train the quality too
much time, it took too much cost Similarly, if we use a
constant smoothing factor for the same treating to different
occurrence level of words, it is not good as we described
above
In this section, we will improve our bootstrapping word
alignment algorithm by improving the way of choosing the
Table 7: Improving results when setting the smooth-ing factor value is 8
smooth factors That is we do not choose the same smooth-ing factor for all each bootstrappsmooth-ing iterations Alterna-tively, we take the smoothing factors between each ∆i and
∆i+1as a consecutively sequence:
S : {0, 1, 2, 3, 4, 5, , n − 1}
and we get the corresponding set N :
N : {0, 1, 3, 6, 10, 15, , ∆n}
This comes from the fact as well-described by the English and Vietnamese words statistical Tables We could see it clearly that for the words which have smaller frequencies, the more word is rarely happens, the more over-fitting when
we test its ANBA index Otherwise, it is slightly better for these other words which are happened more often
We integrate our improving in choosing the set N corre-sponding to the set S to our upgraded version of the boot-strapping word alignment algorithm to have a better algo-rithm We found that our improving algorithm could also improving the similar results In more details, Table 8 and
9 presents how Model 1 and Model 2 could be improved (de-noted by the increasing BLEU score measurement) by using our refining scheme
In fact, with our upgraded version, the improving in choos-ing the set N helps us not only need much less computational power but also cover a larger range of the classes of the oc-currences of words In practice, we found that we only need around 5 times for running the bootstrapping word align-ment algorithm for IBM Models 1-2 when we compare to apply our original implementation
Size of S BLEU Score ∆
Table 8: Evaluation on the refining scheme - Im-proving IBM Model 1
Size of S BLEU Score ∆
Table 9: Evaluation on the refining scheme - Im-proving IBM Model 2
Trang 65.4 Evaluation On Improving Fertility Model
In this section, we will go to one of the most interesting
thing which we want to emphasize We will present the
boosting translation quality to fertility models based on the
improving from lexical translation modelling Actually, the
improving 0.5% BLEU score measurement is not too much
important The most important thing is that by improving
lexical translation modelling, higher fertility models could
boost their translation quality around 1% in BLEU score
when we compare to the original implementation
To reason the boosting capacity of using higher
transla-tion models, at first we consider new obtaining results about
the words statistical tables when we apply our bootstrapping
word alignment algorithm These “ANBA” index tables
de-rived from using the test with N with it cardinality is 14 in
the evaluation in the Table 9 We could see it clearly that the
ANBA index is reduced quite divergence strongly as we let
them out previously when we compare to the results which
we train by our original method
σ = 0.73 Table 10: The Upgraded ANBA Statistical Table for
Vietnamese Words
σ = 3.66 Table 11: The Upgraded ANBA Statistical Table for
English Words
Thinking it likes we have a better balance in ANBA
in-dices This will let us have a better initial translation
pa-rameters using for the initial papa-rameters for training
fertil-ity models In addition to the better in the initial fertilfertil-ity
transferring parameters, more important, there is no “trick”
for helping us to train IBM Model 3 in a very fast way
which could “care” all the possible alignments for each pair
of parallel sentences in our training data for finding the best
“Viterbi” alignments Our strategy is to carry out the sums
of translations only over some of the more probable align-ments, ignoring the vast sea of much less probable ones Since we begin with the most probable alignment that we can find and then include all alignments that can be ob-tained from it by small changes, the improving results in lexical modelling is very important
Based on the original equation by [9], following to our bet-ter “Vibet-terbi” alignments from IBM Model 2, we will have a new translation probabilities and position alignment proba-bilities derived from our “Viterbi” alignment sequences ob-taining from applying the bootstrapping word alignment al-gorithm for training IBM Models 1-2:
t(f |e) =Pcount(f, e)
a(i|j, le, lf ) = Pcount(i, j, le, lf )
icount(i, j, le, lf ) (9) Hence, we have new initial translation probabilities and new initial alignment probabilities By using them as the initial parameter transferring to the training IBM Model
3, we will show that we could obtain a better translation probabilities, as described in the Table 12:
Size of S BLEU Score ∆
Table 12: Improving IBM Model 3
It is clearly from the above evaluation results that by im-proving IBM Model 1 and 2 translation scheme, we could obtain a better SMT system with increasing BLEU Score around 1% measurement Another interesting thing is, in almost case, we always gain a better results, which are al-most around 1% BLEU score improving measurement
To besides, the more appealing thing is, we could see that
by applying our improving, we do not change deeply the monotonic of the ANBA index It means that we slightly change them but the improving result is quite impressed It makes us a strong believe that in the future, by applying other better improving translation scheme, and together, changing more deeply the monotonic of the ANBA index,
we will be able to obtain a more strong impress better SMT system quality
5.5 Evaluation On Larger Training Data
To see our improving could be able to apply to other pairs
of languages, in this section, we will deploy our improving training scheme in the pair of English-French language This also helps us for testing the influences in larger training data
We apply the improved bootstrapping algorithm on our training data which contains 100, 000 English - French par-allel sentences (4 million of tokens) The original ANBA indices Tables when we running the original IBM Model 1-2 (the BLEU Score is 24.01) is denoted in Table 13 and 14
We choose the size of the set S is 14 After applying our Bootstrapping Word Alignment Algorithm, we have a better SMT system with BLEU Score is 25.17, and a better ANBA
Trang 7Freq Num of Words ANBA Deviation
σ = 4.30 Table 13: The ANBA Statistical Table for French
Words
σ = 5.55 Table 14: The ANBA Statistical Table for English
Words
indices, as denoted in Table 15 and 16 Finally, the boosting
of fertility models denoted in Table 17
σ = 1.01 Table 15: The Upgraded ANBA Statistical Table for
French Words
The original BLEU score measurement of IBM Model 3 for
this training data is 24.33 Our improving got a BLEU score
26.05 and it’s a significantly better result compared to the
original result It seems like our bootstrapping algorithm is
improved the quality of statistical phrase-based translation
system better for the larger training data
6 CONCLUSION AND FUTURE WORK
Under a word-based approach, frequent words with a
con-sistent translation can be aligned at a high rate of
σ = 1.57 Table 16: The Upgraded ANBA Statistical Table for English Words
Table 17: Improving IBM Model 3
sion However, words that are less frequent or exhibit diverse translations do not have statistically significant evidence for confident alignment, thereby leading to incomplete or in-correct alignments We have already presented this aspect based on the proposed index ANBA We have also pointed out the fact that using a pure statistical IBM translation model is not enough to reach the state-of-the-art in word alignment modelling
To overcome this problem, we present an effective boot-strapping word alignment algorithm In addition, we found that there are also a lot of effected methods which are quite simple and easy for boosting the quality of IBM Models 1-2 and hence improving the machine translation quality in over-all Following to this scheme and other methods, we believe that these improvements will improve the fertility models
to be obtained the state-of-the-art of statistical alignment quality
7 ACKNOWLEDGEMENT
This work is partially supported by the Vietnam’s Na-tional Foundation for Science and Technology Development (NAFOSTED), project code 102.99.35.09 This work is also partially supported by the project KC.01.TN04/11-15
8 REFERENCES
[1] P F Brown, V J D Pietra, S A D Pietra, and
R L Mercer The mathematics of statistical machine translation: parameter estimation Comput Linguist., 19:263–311, June 1993
[2] C Callison-Burch, D Talbot, and M Osborne Statistical machine translation with word- and sentence-aligned parallel corpora In Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics, ACL ’04, Stroudsburg, PA, USA, 2004 Association for Computational Linguistics [3] A P Dempster, N M Laird, and D B Rubin Maximum likelihood from incomplete data via the em algorithm JOURNAL OF THE ROYAL
Trang 8STATISTICAL SOCIETY, SERIES B, 39(1):1–38,
1977
[4] A Fraser and D Marcu Measuring word alignment
quality for statistical machine translation Comput
Linguist., 33:293–303, Sept 2007
[5] C Hoang, A Le, P Nguyen, and T Ho Exploiting
non-parallel corpora for statistical machine
translation In Proceedings of The 9th IEEE-RIVF
International Conference on Computing and
Communication Technologies, pages 97 – 102 IEEE
Computer Society, 2012
[6] C Hoang, A Le, and B Pham A systematic
comparison of various statistical alignment models for
statistical english-vietnamese phrase-based translation
(to appear) In Proceedings of The 4th International
Conference on Knowledge and Systems Engineering
IEEE Computer Society, 2012
[7] S J Ker and J S Chang A class-based approach to
word alignment Comput Linguist., 23:313–343, June
1997
[8] P Koehn, H Hoang, A Birch, C Callison-Burch,
M Federico, N Bertoldi, B Cowan, W Shen,
C Moran, R Zens, C Dyer, O Bojar, A Constantin,
and E Herbst Moses: open source toolkit for
statistical machine translation In Proceedings of the
45th Annual Meeting of the ACL on Interactive Poster
and Demonstration Sessions, ACL ’07, pages 177–180,
Stroudsburg, PA, USA, 2007 Association for
Computational Linguistics
[9] P Koehn, F J Och, and D Marcu Statistical
phrase-based translation In Proceedings of the 2003
Conference of the North American Chapter of the
Association for Computational Linguistics on Human
Language Technology - Volume 1, NAACL ’03, pages
48–54, Stroudsburg, PA, USA, 2003 Association for
Computational Linguistics
[10] F J Och and H Ney Improved statistical alignment
models In Proceedings of the 38th Annual Meeting on
Association for Computational Linguistics, ACL ’00,
pages 440–447, Stroudsburg, PA, USA, 2000
Association for Computational Linguistics
[11] F J Och and H Ney A systematic comparison of
various statistical alignment models Comput
Linguist., 29:19–51, March 2003
[12] F J Och and H Ney The alignment template
approach to statistical machine translation Comput
Linguist., 30:19–51, June 2004
[13] K Papineni, S Roukos, T Ward, and W.-J Zhu
Bleu: a method for automatic evaluation of machine
translation In Proceedings of the 40th Annual Meeting
on Association for Computational Linguistics, ACL
’02, pages 311–318, Stroudsburg, PA, USA, 2002
Association for Computational Linguistics
[14] D Vilar, M Popovi´c, and H Ney AER: Do we need
to ”improve” our alignments? In International
Workshop on Spoken Language Translation, pages
205–212, Kyoto, Japan, Nov 2006