This paper provides a systematic comparison of applying var-ious statistical alignment models for statistical English-Vietnamese phrase-based machine translation.. We will also invest
Trang 1A Systematic Comparison Between Various Statistical Alignment Models for
Statistical English-Vietnamese Phrase-Based Translation
Cuong Hoang1, Cuong Anh Le1, Son Bao Pham1,2
1Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi
2Information Technology Institute Vietnam National University, Hanoi
{cuongh, cuongla, sonpb}@vnu.edu.vn
Abstract
In statistical phrase-based machine translation, the step
of phrase learning heavily relies on word alignments This
paper provides a systematic comparison of applying
var-ious statistical alignment models for statistical
English-Vietnamese phrase-based machine translation We will also
invest a heuristic method for elevating the translation
qual-ity of using higher word-alignment models by improving the
quality of lexical modelling In detail, we will
experimen-tally show that taking up the lexical translation seems to
be an appropriate approach to force “higher” word-based
translation models be able to efficiently “boost” their
mer-its We hope this work will be a reliable comparison
bench-mark for other studies on using and improving the statistical
alignment models for English-Vietnamese machine
transla-tion systems.
1 Introduction
Statistical Machine Translation (SMT) is a machine
translation approach which depends on creating a
parame-ter probabilistic model by analyzing parallel sentence pairs
in a bilingual corpus In SMT, the best performing systems
are based in some ways on phrases The basic idea of the
phrase-based translation paradigm is to learn to break given
source sentence into phrases, then separately translate each
of them These translation phrases are finally combined to
generate the target sentence [9]
3-5) yields better results than applying a simple word-based translation model [9][10]
However, surprisingly, for the case of English-Vietnamese phrase-based SMT, we found that this conclu-sion is not always true That is, the quality of those SMT systems which were trained by these alignment models is usually strong worse than using simple word-based align-ment models (IBM Models 1-2) However, no previous work concerns with a systematic analyzing for the effects
of using the alignment models for English-Vietnamese sta-tistical phrase-based SMT system
Hence, this paper focuses on a systematic comparison between the alignment models Following to the analyz-ing results, we also point out some important aspects of deploying the word-alignment component for the language pair English-Vietnamese, which could significantly lead the translation quality in overall These are the best training scheme [16], the number of iterations for training each model, or the probability of tossing in a spurious word [1]
In addition, we also propose a scheme for improving the translation quality of using higher word-based align-ment models In detail, we found that to attack to lexical translation seems to be the right approach to allow higher alignment models be able to “boost” their quality better To evidence this paradigm, we focus on initializing Model 1 with a better heuristic parameter estimation After that, we present the “boosting” capacity in overall
Not only taking the experimental evaluation on GIZA++ [16], we also implement LGIZA1, as a lightweight SMT toolkit that is used to train Models 1-3 LGIZA is
imple-2012 Fourth International Conference on Knowledge and Systems Engineering
Trang 2specific case We hope this work will be the reliable
com-parison benchmarks for following researches on building an
English-Vietnamese SMT system
2 Word-based Machine Translation Models
model
Model 1 assumes a source sentencef J of lengthJ is
translated into a target sentencee I
1of lengthI It is defined
as a particularly simple instance of the translation
frame-work, by assuming that all possible lengths forf J (less than
some arbitrary upper bounds) have an uniform probability
The word order does not affect the alignment probability,
orP r(J|e I
1) is independent of e I
1andJ.
Therefore, all possible choices of generating the target
words by source words are equal Lett(f j |e i) be the
trans-lation probability off j givene i The alignment is
deter-mined by specifying the values ofa jforj from 1 to J [1]
yields the following summarization equation:
P r(f|e) =
(I + 1) J
J
j=1
I
i=0
t(f j |e i) (1)
The parametert is normally estimated by the EM
algo-rithms [2] In Model 1, we take no cognizance of where
words appear in either string The first word in thef Jstring
is just as likely to be connected to a word at the end of thee I
1
string as to one at the beginning Simply, for Model 2, we
make the same assumptions as in Model 1 except that we
assume the translation probabilityP r(a j |a j−11 , f1j−1 , J, e)
depends onj, a j, andJ, as well as on I:
P r(f|e) =
J
j=1
I
i=0
t(f j |e i )a(i|j, J, I) (2)
Model 2 attempts to model the absolute distortion of
words in sentence pairs [18] suggests that alignments have
a strong tendency to maintain the local neighbourhood
af-ter translation HMM word-based alignment uses a first
order Hidden Markov Model to restructure the alignment
modelP r(e I
1, a|s J) used in Model 2 to include the first
or-der alignment dependencies:
P r(f J , a J |e I
1) =
(I + 1) J
J
j=1 t(f j |e i )P r a (a(j)|a(j−1), I)
(3) where the alignment probabilityP r a (a(j)|a(j − 1), I) is
calculated as:
P r a (i|i , I) =I c(i − i )
k=1 c(k − i ) (4)
From the above formulation, the distortion probability does not depend on the word positions but in the jump width
(i − i )
2.2 Fertility-based Alignment Models
Model 3-4, which are quite more complex models2, yield more accurate results than Model 1-2 mainly based on the fertility-based alignment scheme The original equation for Model 3-4, as described the “joint likelihood” for a tableau,
τ, and a permutation, π, is3:
P r(τ, π|e) =
I
i=1
P r(φ i |φ i−1
1 , e)P r(φ0|φ I
1, e)
I
i=0
φ i
k=1
P r(τ ik |τ i k−1
1 , τ i−1
0 , φ I
0, e)
I
i=1
φ i
k=1
P r(π ik |π i k−1
1 , π i−1
1 , τ I
0, φ I
0, e)
φ0
k=1
P r(π 0k |π0k−1
1 , π I
1, τ I
0, φ I
0, e) (5)
The comparison between applying these word-based alignment models to the English-Vietnamese statistical phrase-based SMT system will be described in depth in sec-tion Experiment
3 How lexical models impact to the quality of applying fertility-based models
Follow the scheme proposed by [1], to train Models 1-2,
at first we uniform the lexical translationt values for every
pair of words From Model 1, for each iteration of the train-ing process, we collect the fractional counts over every pos-sible alignment (pair of words) and then revise to update the values of parametert Similarly, after training Model 1, we
uniform the values of position alignmenta However, we
revise both the lexical translationt and the position
align-menta for each iteration of Model 2.
For the training process of Model 3, we would like to use everything we learnt from the Model 2 training estimation
to set the initial values Then, to collect a subset of “reason-able” alignments, we start with the best “Viterbi” alignment which is marked by the “perspective” of Model 2, and use it
to greedily search for the“Viterbi” alignment of Model 3 It means that we collect up only the “reasonable” neighbours
of the best “Viterbi” alignments
2 For convenience, from now we will use the “higher models” term to imply Model 3-4.
3 For more detail on how Model 3-5 parameterize fertility scheme, please refer to [1].
Trang 3Hence, we could see that the quality of applying
fertility-based models is heavily dependent on the accuracy of the
lexical translation and the position alignment translation
probabilities derived from Model 2 These parameters
di-rectly impacts the fertility-based alignment parameters
More important, there is no “trick” for helping us to train
Model 3 in a very fast way which we could “infer” all the
possible alignments for each pair of parallel sentences Our
strategy is to carry out sum of the translations only one the
high probable alignments, ignoring the vast sea of much low
probable ones Specifically, we begin with the most
prob-able alignment that we can find and then include all
align-ments that can be obtained from it by small changes
4 Improving Lexical Modelling
From the above analysis, we could see that the accuracy
of the lexical translation parameter obtains a very
impor-tant aspect of improving the quality of using higher
word-based alignment models This section will focus on another
approach to improve the quality of using higher alignment
models in overall
There is a classic problem of Models, which was well
described by [4] That is, their parameter estimation might
lack the robustness for the global maximum problem In
detail, these word-based alignment models lead to a local
maximum of the probability of the observed pairs as a
func-tion of the parameters of the model There may be many
such local maxima The particular one at which we arrive
will, in general, depends on the initial choice of the
param-eters It is not clear that these maximum likelihood methods
are robust enough to produce the estimation that can be
re-liably replicated in other laboratories
To improve the final result, we will improve the initial
choice of the lexical translation parameter in an effective
way That is, we start the Model 1 with a heuristic method
for finding the corresponding between lexical translations
in statistics based on the Pearson’s chi-square test
Some previous researches pointed out that using
Pear-son’s chi-square test could also assist us in identifying word
correspondences in bilingual training corpora[3] In fact,
the essence of Pearson’s chi-square test is to compare the
observed frequencies in a table with the frequencies
ex-pected for independence If the difference between
ob-served and expected frequencies is large, then we can reject
the null hypothesis of independence In the simplest case,
the X2 test is applied to 2-by-2 tables The X2 statistic
sums the differences between observed and expected values
in all squares of the table, scaled by the magnitude of the expected values [11], as follow:
X2=
i,j
(O ij − E ij)2
where ranges over rows of the table4, ranges over columns,
O ij is the observed value for cell(i, j) and E ij is the ex-pected value [3] realized that it seems to be a particularly good choice for using the “independence” information Ac-tually they used a measure of judge which they callφ2, which is aX2-like statistic The value of φ2 is bounded between 0 and 1
For more detail on the tutorial to calculate φ2, please refer to [3] Of course, the performance for identifying word correspondences by usingφ2method is not good as using Model 1 together with EM training scheme [16] However,
we believe that this information is quite valuable
Normally, the lexical translation parameter of Model 1
is initialized to an uniform distribution over the target lan-guage vocabulary From the above analysis, we have strong reasons to believe that these values do not produce the most accurate sentence alignments Hence, we use a heuristic model based on the log likelihood-ratio (LLR) statistic rec-ommended by [4, 13] There is no guarantee, of course, that this is the optimal way However, we found that by applying our heuristic improvement, the improving of lexical transla-tion model is significantly improved In additransla-tion, the more impressive thing which we want to emphasize is - by im-proving lexical translation model, using the fertility-based translation models could also gain a better final result
This experiment is deployed on various kinds of training corpora to have an accurate and reliable result The Vietnamese training data was credited by [5] The English-French training corpus was the Hansards corpus [16] We use MOSES framework [8] as the phrase-based SMT frame-work In additions to use GIZA++, we also implement LGIZA toolkit Different to GIZA++, LGIZA is originally implemented based on the original documentary [1] with-out applying other latter improved techniques which are in-tegrated to GIZA++ These are determining word classes for giving a low translation lexicon perplexity (Och, 1999), various smoothing techniques for the fertility, distortion or alignment parameters, symmetrization [16], etc which are
4 Sometimes it is called “contingency tables”.
Trang 4applied in GIZA++ [16] and therefore, the applying other
improved techniques could make our results a little bit noise
in comparison
In this evaluation, we will iteratively use various
word-based alignment models for evaluating the performance in
overall to “boost” the quality of phrase-based SMT
sys-tem Table 1 describes the comparison between BLEU
score [17] for applying various word-based alignment
train-ing schemes to the language pair English-Vietnamese
Sim-ilarly, Table 2 presents the comparison results to the pair
English-French The more detail will denoted in the next
sections
5.1.1 The best training schemes
The training schemes refer to the sequence of used
mod-els and the number of training iterations, which are used
for training each model Our standard training scheme on
the training data is15233343 This notation shows that five
iterations of Model 1, three iterations of Model 2, three
it-erations of Model 3, three itit-erations of Model 4 are
per-formed In practice, we found that this training scheme
typ-ically gives very good results for the language pair
English-Vietnamese when comparing to other training schemes and
it does not lead to the over-fitting problem
Choosing the best training scheme is an important task
We found that if we apply the default GIZA++ training
scheme to the language pair English-Vietnamese, in all
var-ious training corpora, the quality of the system in overall
could be very bad Table 3 points our clearly the worst
effects of choosing the default training scheme of training
GIZA++, which makes our SMT system significantly obtain
a very bad results when comparing to our defined training
scheme
Corpus Default 15233343 Δ(%)
Table 3 Compare to the default training
scheme
Very different to other comparison for other well-known languages, we found that HMM gives a bad result when
we compare to Model 2 for the language pair English-Vietnamese This comes from the fact that HMM model extends Model 1 and Model 2, which models the lexical translation and the distortion translation, by also modelling the relative distortion In detail, the relative distortion is estimated by applying a first-order Hidden Markov Model, where each alignment probability is dependent on the dis-tortion of the previous alignment
However, for the language pair English-Vietnamese, the assumption that each alignment probability is dependent on the distortion of the previous alignment is not true We could see that the transformation of position alignment for the pair English-Vietnamese is quite more complicated than other well-known languages It reflects the quite
differ-ence in the word order between English and Vietnamese
[14] This is another important aspect It leads to a bad quality when we apply the fertility-based models which trained based on the initial transferring parameters from HMM model instead of Model 2 In addition, it also denotes one of the most difficult problems to enhance the quality of
an English-Vietnamese machine translation system - the
re-ordering transferring problem [6].
5.1.3 Model 2 vs Model 3 From the above analysis table, we see that Model 3 gives a bad result when we compare to Model 2 for the language pair English-Vietnamese Table 4 and Table 5 denote the difference between Model 2 vs Model 3 for the language pairs English-Vietnamese and English-French
Corpus Model 2 (1523) Model 3 (152333) Δ(%)
Table 4 Comparing IBM Model 2 - IBM Model
3 for the language pair English-Vietnamese
Also, we found GIZA++ applies many improving tech-niques that mainly used to “boost” the quality of fertility-based models In fact, by applying LGIZA - following to the original IBM Models description by [1] and we do not apply any other improved techniques, we found that the compara-tive result should be stronger contrast
Table 6 and Table 7 present the comparison results when applying each model to statistical phrase-based SMT
Trang 5sys-Model Training Scheme Size of training corpus
15H53343(Default GIZA++) 15.77 16.91 17.22 18.22 18.8
Table 1 Compare BLEU scores between various training schemes (English-Vietnamese)
Model 4 15H53343(Default GIZA++) 22.73 24.69 25.56 24.43 26.59
Table 2 Compare BLEU scores between various training schemes (English-French)
Corpus Model 2 (1525) Model 3 (152533) Δ(%)
Table 5 Comparing IBM Model 2 - IBM Model
3 for the language pair English-French
tems of the language pairs of English - Vietnamese and
English - French Seriously, without applying other
tech-niques, the quality of applying Model 3 is bad for the
lan-guage pair English-Vietnamese However, for the lanlan-guage
pair English-French, Model 3 gives a slightly worse than
Model 2 That is why GIZA++ (with the help of some
im-prove techniques as we discuss above) usually obtains a
bet-ter result when we compare to Model 2
5.1.4 IBM Model 1 and 2 vs IBM Model 3 and 4
It is steady confirm that Model 4 is an significantly better
than Model 3 [16] This comes from the fact that the source
language string constitutes phrases that are translated as
units into the target language The distortion probabilities
of Model 3 do not account well for this tendency of phrases
Corpus Model 2 (1523) Model 3 (152333) Δ(%)
Table 6 Compare IBM Model 2 - IBM Model 3 (English-Vietnamese) by LGIZA
to move around as units More important, Model 4 also
provides us a very efficient way to integrate the linguistics
knowledge between the language pairs into the statistical
alignment model
However, the training of Model 4 is dependent on Model
3 It means that Model 4 uses the fertility-based modelling probability and other probabilities as the initial transferring parameters for training Model 4 Since we could see from the above translation models that Model 3 could not “boost” its total merits as some other well-known pairs of languages Hence, for the language pair English-Vietnamese, the im-provement of Model 4 is not very well, too
Trang 6Corpus Model 2 (1525) Model 3 (152533) Δ(%)
Table 7 Compare IBM Model 2 - IBM Model 3
(English-French) by LGIZA
5.1.5 P0vsP1values
For the fertility-based models, there is an important
con-cept about the probability for generating an empty con-cept or a
word In a formal explanation for the language pair
English-French by [7], after we assign fertilities to all the “real”
English words (excluding NULL), we will be ready to
gen-erate (say) z French words As we gengen-erate each of these
z words, we optionally toss in a spurious French word also,
with probabilityp1 We’ll refer to the probability of not
tossing in (at each point) a spurious word asp0=1 − p1
The couple(p0, p1) is obtained as an unique value
accord-ing to a pair of languages
The following tables denote the probability(p0, p1) for
the language pair English-Vietnamese Table 8 presents the
pair(p0, p1) value for English language Table 9 presents
the pair(p0, p1) for Vietnamese language For a larger
train-ing corpus, we could see that thep0 of English language
value converges to an approximate value around0.89, and
p1converges to an approximate value around0.11 In other
directions, thep0value of Vietnamese converges to an
ap-proximate value around0.18.
Table 8 P0 vs P1 of English for the pair
English-Vietnamese
The default training scheme of GIZA++ is set up in the
way thatp0of English language is0.999 and fixed the value
for parameterp0in Model 3 and Model 4 Since0.9999 is
far away from0.89, it is better to change the value of p0
in training process to obtain a better result Also, because
modelling the NULL translation is difficult and the
proba-bilityp1of Vietnamese is greater than English, it is harder
to build a translation system from English to Vietnamese
Table 9 P0 vs P1 of Vietnamese for the pair English-Vietnamese
than from Vietnamese to English Therefore, we will ob-tain a better system from English to Vietnamese than from Vietnamese to English (Bayesian reasoning [1])
Table 10 P0 vs P1 of English for the pair English-French
Table 11 P0 vs P1 of French for the pair English-French
In other words, an English to Vietnamese translation sys-tem will usually obtain a higher BLEU Score than a Viet-namese to English translation system However, from Table
10 and 11, this is not happened for the pair English-French Hence, our suggestion is that if we want to build an English-Vietnamese parallel extraction system, it is better to trans-late from English to Vietnamese and then we process the translation sentences by the processing framework Other-wise, if we apply an improving technique, it is better to test its effect on a Vietnamese-English translation system
Modelling
From the above comparison, we could see that we need
to have some ways to improve the quality of using higher
Trang 7models This problem could be definite as the higher model
could not “boost” all its hidden power As we mentioned,
this is another important aspect In our opinion, to
prove the quality of word alignment is one of the most
im-portant works to obtain the state-of-the-art of an
English-Vietnamese SMT system
Addressing this problem, different to previous methods,
which focus on improving the quality of statistical machine
translation by combine the final result to other features as
a log-linear combination model [15][12], we will focus
on improving lexical modelling for boosting the quality of
fertility-based models better This experiment section will
take an evidence for our method
5.2.1 Baseline Results
We test our improving method for various training
cor-pora for both two language pairs English-Vietnamese and
English-French to see the effects of applying our improved
heuristic initializing Model 1 parameters The original
re-sults for applying the original model implementations were
described in the Table 12 for the pair English-Vietnamese
and in the Table 13 for the pair English-French Each
col-umn represents the BLEU score measuring for each IBM
translation model
Table 12 Baseline results for the pair
English-Vietnamese
Table 13 Baseline results for the pair
English-French
5.2.2 Improved IBM Model by Heuristic Initializing
Each translation model has its specific and different view
of modelling the way of translation Consecutively, each
of them has a difference in the computing equation for de-noting the translation probability However, there is a strong relationship between them That is, the more complex trans-lation models use the estimating results derived from a sim-pler translation model as the initializing value Our experi-mental results point out clearly and deeply this perspective Table 14 describes the improved results for the language pair English-Vietnamese Similarly, Table 15 shows the ef-fects for the language pair English-French
Table 14 Improved results of IBM Models for the pair English-Vietnamese
Table 15 Improved results of IBM Models for the pair English-French
Recently researches point out that it is difficult to achieve heavily gains in translation performance based on improv-ing word-based alignment results The better lexical trans-lation models could be quite strong but it is very hard to
“boost” the quality a translation system in overall [10] However, with a very basic improving in initializing the Model 1 parameters, we could see that the BLEU score of using Model 3 is increased even then the improved of Mod-els 1-2 Together, the improving result is better for a larger training corpus
The step of learning phrase in statistical phrase-based translation, which is the current state-of-the-art in SMT, is absolutely important In brief, word-based alignment com-ponent affects directly the phrase pairs that are extracted from the training corpora This research has taken a system-atic comparison between using various word-based align-ment models for phrase-based SMT systems We have
Trang 8found that using HMM and fertility-based alignment
mod-els usually gives better results for the language pair
English-French However, for English-Vietnamese, the comparison
result is usually contrasted
Previous researches for improving the overall quality of
statistical phrase-based translation system point out the fact
that it is very hard to improve the BLEU score over 1%
However, from the comparison results, we could see that
by appropriately configuring the best training scheme and
other features such as the probability of tossing spurious
words for each pair of languages could significantly lead
the quality of statistical phrase-based machine translation
The other contribution of our work is that we have clearly
denoted the importance of the lexical alignment model to
the higher translation models in the training process In
de-tails, we have pointed out the fact that we heavily need to
improve the quality of fertility-based models to enhance the
quality of using higher word-alignment models for
statis-tical phrase-based machine translation It is especially
im-portant for the language pair English-Vietnamese, for which
the quality of using Model 3 as the word-based alignment
component is bad when we compare to the pair
English-French
This work is partially supported by the CN.10.01 project
at University of Engineering and Technology, Vietnam
Na-tional University, Hanoi This work is also partially
sup-ported by the Vietnam’s National Foundation for Science
and Technology Development (NAFOSTED), project code
102.99.35.09 and the project KC.01.TN04/11-15 We are
thankful to the anonymous reviewers for their comments,
especially to the one who suggests us to use the Berkeley
aligner and also recommends us to correctly revise some
our own affirmations
References
[1] P F Brown, V J D Pietra, S A D Pietra, and R L Mercer
The mathematics of statistical machine translation:
parame-ter estimation Comput Linguist., 19:263–311, June 1993.
[2] A P Dempster, N M Laird, and D B Rubin
Maxi-mum likelihood from incomplete data via the em algorithm
JOURNAL OF THE ROYAL STATISTICAL SOCIETY,
SE-RIES B, 39(1):1–38, 1977.
[3] W A Gale and K W Church Identifying word
correspon-dence in parallel texts In Proceedings of the workshop
on Speech and Natural Language, HLT ’91, pages 152–
157, Stroudsburg, PA, USA, 1991 Association for
Compu-tational Linguistics
[4] W A Gale and K W Church A program for aligning
sen-tences in bilingual corpora Comput Linguist., 19:75–102,
March 1993
[5] C Hoang, A Le, P Nguyen, and T Ho Exploiting
non-parallel corpora for statistical machine translation In
Pro-ceedings of The 9th IEEE-RIVF International Conference
on Computing and Communication Technologies, pages 97
– 102 IEEE Computer Society, 2012
[6] V Hoang, M Ngo, and D Dinh A dependency-based word reordering approach for statistical machine translation In
RIVF, pages 120–127, 2008.
[7] K Knight A Statistical MT Tutorial Workbook Aug 1999 [8] P Koehn, H Hoang, A Birch, C Callison-Burch, M Fed-erico, N Bertoldi, B Cowan, W Shen, C Moran, R Zens,
C Dyer, O Bojar, A Constantin, and E Herbst Moses: open source toolkit for statistical machine translation In
Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL ’07,
pages 177–180, Stroudsburg, PA, USA, 2007 Association for Computational Linguistics
[9] P Koehn, F J Och, and D Marcu Statistical phrase-based
translation In Proceedings of the 2003 Conference of the
North American Chapter of the Association for Computa-tional Linguistics on Human Language Technology - Volume
1, NAACL ’03, pages 48–54, Stroudsburg, PA, USA, 2003.
Association for Computational Linguistics
[10] A Lopez Word-based alignment, phrase-based translation:
Whats the link In In Proc of AMTA, pages 90–99, 2006 [11] C D Manning and H Sch¨utze Foundations of statistical
natural language processing MIT Press, Cambridge, MA,
USA, 1999
[12] J B Mari`oo, R E Banchs, J M Crego, A de Gis-pert, P Lambert, J A R Fonollosa, and M R
Costa-juss`a N-gram-based machine translation Comput
Lin-guist., 32(4):527–549, Dec 2006.
[13] R C Moore Improving ibm word-alignment model 1, 2005
[14] T P Nguyen, A Shimazu, T.-B Ho, M Le Nguyen, and
V Van Nguyen A tree-to-string phrase-based model for
sta-tistical machine translation In Proceedings of the Twelfth
Conference on Computational Natural Language Learning,
CoNLL ’08, pages 143–150, Stroudsburg, PA, USA, 2008 Association for Computational Linguistics
[15] F J Och and H Ney Discriminative training and max-imum entropy models for statistical machine translation
In Proceedings of the 40th Annual Meeting on
Associa-tion for ComputaAssocia-tional Linguistics, ACL ’02, pages 295–
302, Stroudsburg, PA, USA, 2002 Association for Compu-tational Linguistics
[16] F J Och and H Ney A systematic comparison of various
statistical alignment models Comput Linguist., 29:19–51,
March 2003
[17] K Papineni, S Roukos, T Ward, and W.-J Zhu Bleu:
a method for automatic evaluation of machine translation
In Proceedings of the 40th Annual Meeting on
Associa-tion for ComputaAssocia-tional Linguistics, ACL ’02, pages 311–
318, Stroudsburg, PA, USA, 2002 Association for Compu-tational Linguistics
[18] S Vogel, H Ney, and C Tillmann Hmm-based word
align-ment in statistical translation In Proceedings of the 16th
conference on Computational linguistics - Volume 2,
COL-ING ’96, pages 836–841, Stroudsburg, PA, USA, 1996 As-sociation for Computational Linguistics