Luận văn enhancing the quality of machine translation system using cross lingual word embedding models

VIETNAM NATIONAL UNIVERISTY, HANOL UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN MINH THUAN Enhancing the quality of Machine Translation System Using Cross-Lingual Word Embedding Mo

Trang 1

VIETNAM NATIONAL UNIVERISTY, HANOL UNIVERSITY OF ENGINEERING AND TECHNOLOGY

NGUYEN MINH THUAN

Enhancing the quality of Machine Translation System

Using Cross-Lingual Word Embedding Models

(Nâng cao chất lượng của hệ thông dịch máy dựa trên các

mô hình vector nhúng biểu diễn từ giữa hai ngôn ngữ)

Program: Computer Science Major: Computer Science

Code: 8480101.01

MASTER THESIS: COMPUTER SCIENCE

SUPERVISOR: Assoc Prof NGUYEN PHUONG THAI

Hanoi — 11/2018

Trang 2

Enhancing the quality of Machine

Translation System Using Cross-Lingual

Word Embedding Models

ie

eg

Nguyen Minh Thuan

Faculty of Information Technology

University of Engineering and Technology

Vietnam National University, Hanoi

Supervised by Associate Professor Nguyen Phuong Thai

‘A thesis submitted in fulfillment of the requirements

for the degree of Master of Science in Computer Science

November 2018

Trang 4

ORIGINALITY STATEMENT

‘Thereby declare thal tis submission is amy own work aud tu the best of aay kuowledge

it coulains no matecials previously published or wzillen by anoller persou, or substan lial proportions of aterial which have been accepted for lie award of any other degree

or diploma at University of Engineering and ‘lechnology (UI1'/Coltech) or any other educational institution, except where due acknowledgement is made in the thesis Any

'/Coltech

contribution made to the research by others with whom | have worked at UL’

or elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance

Trang 5

amounts of bilingaul corpora which require muck effort and financial support The

lack of bilingual data leads to a poor phrase-t

nents of PRSMT, and the unknown word problem in NMT Fn, contrast, monolingual

ble, which is one of the main compo-

data are available for most of the languages Thanks to the aduantage many models

af word embedding and eros lingual ward embedding have been appeared to improve the quatity of various tasks in natural language processing ‘The purpose of this thesis

is to propose two models for using cross lingual word embedding models to address the above impediment The first model enhances the quality of the phrasc-table in SMT, and ihe remaining model tackles the unknown word problem in NMT

Trang 6

Vinh and MSc Vu Huy Hien They are my inspiration, guiding me to get the better

of many obstacles in the completion this thesis

I am grateful to my family They usually encourage, motivate and create the

best conditions for me to accomplish this thesis

I would like to also thank my brother, Nguyen Minh Thong, my friends,

‘Tran Minh Luycn, Hoang Gong Tuan Anb, for giving me many uscful advices and supporting my thesis, my studying and my living

Finally, I sincerely acknowledge the Vietnam National University, Hanoi and especially, TC.02-2018.03 project named “Building a machine translation system

to support translation of documents between Vietnamese and Japanese to help managers and businesses in Ianoi approach Japanese market” for supporting

finance to my master study,

Trang 7

To my family 9

Trang 8

2.1.4 Open-Source Machine Translation 06 60 ee 8

2.1.4.1 Moses - an Open Statistical Machine Translation

Systems gai o Ro 5 E38 3 94 E SẺ E4 § 6 10

2.2 Word Embedding ul

2.2.1 Monolingual Word Embedding Models 12

2 Cross-Lingual Word Embedding Models 13

3 Using Cross-Lingual Word Embedding Models for Machine Trans- lation Systems 17 3.1 Enhancing the quality of Phrase-table in SMT Using Cross-Lingual

Word Embedding 0 eee 17 3.1.1 Recomputing Phrase-table weights 18 3.1.2 Generating new phrase pairs 19 3.2 Addressing the Unknown Word Problem in NMT

Word Embedding Models eee eee eee 21

4 Experiments and Results 27 4.1 Settings

42 Results

Trang 9

# Conchision

Trang 10

Toy illustration o[ thơ cross-lingual embedding model

Tlow of training phrase

Flow of testing phrase

Example in testing phrase

vii

Trang 11

‘The sample of new phrase pairs generated by using projections of

word vector representations

Bilingual corpora,

The precision of word translation retrieval topk nearest neighbors in

Vietnamese-Fnglish and Japanese-Vietnamese language pairs

Results on UET and TED dutusct in the PBSMT systou [or Vietownese-

Englich and Japanese Vietnamese respectively `

Transtation examples of the PBSMT in Vietnamese-English Results of remaving unknown words on UET and TED dataset in

the NMT system lor Vicbnameee-Bnglisi and Japancso-VieLnainoee

Trang 13

Chapter 1

Introduction

Machine Translation (MT) is a sub-field of compntational lingnisties Tt is anto-

mated translation, which translates text or speech from one natural langnage to

another by using computer software Nowadays, machine translation systems attain much success in practice, and two approaches that have been widely used for M'l' are Phrase-based statistical machine translation (PBSMT) and Neural Machine ‘Lrane- lution (NMT) In thy PBSMT system, Une core of this system is Uke plarase-tuble, which eoutaing words aud phreses for SMT system lo Grauslale, In che translation process, sentences are split into distinguished party as shown in (Kochi ct ul., 2007)

(Koehn, 2010) At each step, for a given source phrase, the system will try to find the best, candidate amongst: many target phrases as its translation based mainly on phrase-table Tlence, having a good phrase-table possihly makes translation systems improve the quality of translation However, attaining a rich phrase-table is a chal: lenge since the phrase-table is extracted and trained from large amounts of bilingual corpora which require much effort and financial support, especially for less common Tungunges such as Vietwramesc, Laos, ete, In the NMT system, two unin components are encoder and devoder, the encoder eouponcut uses a neural network, such as the reenrrent, nenral nerwark {RNN}, to encode the source sentence, and the decoder component also uses a nenral network to predict words in the target language Some NMT models incorparate attention mechanisms to improve the translation qnality

conventional NM‘Ll systems often limit

Trang 14

the proper translation for this unknown words during testing as shown in (Luong

et al., 2015b} (Li et al., 2016)

Latterly, there are several approaches to acldress the above impediments With the probicia in thy PBSMT system (Pussbun cb ul., 2016) proposed » wethod of using now scorey generated by » Convolution Neural Network which indicates Use se- mautic relutcduess of phrase pairs They altuiued an improvement of upproxiuately 0.55 BLEU acore However, their method is suirable for medinm-size corpora and creates mare scores for the phrase-table which can increase computation complexity

of all translation systems

(Cui et al., 2018) utilized techniques of pivot languages to enrich their phrase-table

‘Vheix phrase-table is made of souroe-pivob and pivot-target phrase-tables As a resull of this combination, they utleined u significa iuprovement of ranslution Similorly, (Zhu et al., 2014) used a ncthed based on pivot lunguages to culeubate the

by using co-occurrence frequencies collected from bilingual data However, their method needs w lob of bilinguul corpora Lo extiimbe wecurutely the probubilitics Lor dictionury entries, which arc not available for low-resource hug wages

In order bo address the unknown word problem in NMT system (Luong cb al., 2015b) annotaced the training bilingual corpus with explicit alignment information that allows the NMT system to emit, for each unknown word in the target sentence, the position of its corresponding word in the sonree sentence This information is then used in a post-processing step to translate every unknown word by using a bilingual dictionary ‘he method showed @ substantial improvement of up to 2.8 BLEL points over various NMI‘ systems on WMI'I"ld English-French translation

tusk, However, huving Uke good dictionary, which is utilized in the post-processing

slep, is also costly and Uime-comsuming

(Sennrich et al., 2016) introduced a simple approach to handle the translation of unknown words in NMT by encoding unknown words as a sequence of snhword units This method based on the intuition that a variety of word classes are translated via smaller units than words lor example, names are translated by character copying or

Trang 15

transliteration, compounds are translated via compositional translation, etc ‘Lhe approach indicated an improvement up to 1.3 BLEU over a backoff dictionary baseline model on WM'l' 15 English-Russian translation ta:

(Li ct al, 2016) proposed a novel substitationtranslalion-restoration inethod to tackle Une problen of the NMT wokuewn word In this method, the substilution

slop replaces the unknown words inv Luling scatenve with similar in-vecabulary

words based on a similarity model leamed from monolingual data The translation step then translates the testing sentence with a model trained on bilingnal data with unknown words replaced Linally, the restoration step substitutes the translations of the replaced words by that of original ones ‘Chis method demonstrated a significant improvement up to 4 BLEU points over the attention-based NM on Chinese-to- English translation

Recently, lochniques using word cabedding reccive much interest frou oabural Tungunge provesving communities Word cubedding is a vector represcutation of words which conserves semantic information and their contexts words Additionally,

we can exploit the advantage of embedding to represent, words in divense distinetion spaces as shown in (Mikolov et al., 2012b) Besides, cross-lingnal ward embedding models are also receiving a lot of interest, which learn cross-lingual representations

of words in a joint embedding space to represent meaning and transfer knowledge in cross-lingual scenarios Inspired by the advantages of the cross-lingual embedding models, the work of (Mikoloy et ul, 2018b) and (Li et al, 2016), we propose a model to enhance the quality of w plirase-tuble by cecompuling the plaase weights and generating uew plirase pairs for the phrase-table, und a uodel lo address the unknown word problem in the NMT system by replacing the unknown worda with the most appropriate in-vacabulary words

The rest af this thesis is organized as follows: Chapter 2 givea an overview of related backgrounds In Chapter J, we describe our two proposed models A model enhances the quality of phrase-table in SM'I’, and the remaining model tackles the unknown word problem in NM'T Settings and results of our experiments are shown

in Chapter 4 We indicale our conclusion and fubuce works in Chapter 5.

Trang 16

Chapter 2

Literature review

Tn this chapter, we indicate an overview of Machine Transtation (MT) research and Word Embedding models in section 2.1 and 2.2 respectively Section 2.1 shows the history, approeches, evaluation and open-source in M'l', In section 2.2, we introduce

an overview of Word Embedding including Monolingual and Cross Lingual Word Embedding models

2.11 History

Machine Translation is a sub-ficid of compulativuul linguistics It is automated

trauslation, which trauslales text or epeech from one natural language to another

hy nsing computer software The first: ideas of machine translation may have appeared im the seventh century Descartes and Leibniz proposed theories of how to create dictionaries by using universal numerical codes

1n the mid-193Us, Georges Artsrouni attempted to build “translation machines” by using paper tape to create an automatic dictimary After that, Peter ‘Lroyanskii proposed a model including a bilingual dictionary and a methed for handling gram: mativul issues between luaguages based on uke Esperusto's grummatival system,

On January 7th, 1954, ab the hewd office of TBM in New York, the first machine translation system was published by Geor; tow IBM experiment Tt automatically

translated 60 sentences from Russian ta Bnglish for the first time and opened a race for machine translation in many countries, snch as Canada, Germany, and Japan However, in 1960 the Automatic Language Processing Advisory Committee (AL-

4

Trang 17

2.1 Machine ‘Translation “=

PAC) reported that the ten-year-long research failed to fulfill expectations in (Vogel

et al., 1996) During the 1980s, a lot of activities in MT were executed, especially

in Japan At this time, research in MJ’ typically depended on translation through a vutiely of intermediary linguistic represcutution including syntuetic, morphological, aud sumantic aualysis Al thc end of the 1980y, since computational power inercused aud becure less expensive, more rescarch was ablcumpled in the statistical approach for MT

During the 2000s, research in WIT haa seen major changes A lot of research has focused on example-based machine translation and statistical machine translation (SMI) Besides, researchers also gave more interests in hybriclization by combining morphological and syntactic knowledge into statistical systems, as well as combining slutislics with existing cule-bused systems Recently, the bot trend of MT is using Iurge urtilieial ucural acbwork into MT, culled Neural Machine Translation (WMT)

In 2014, (Cho ct ul., 2014) published the first paper on using ucural networks in MT, followed by a lot of research in the follawing few years Apart from the research

on bilingual machine translation systems, in 2018 researchers paid mnch attention

to unsupervised nenral machine translation (UNMT) which only used monolingual data to train the M'l’ system

Trang 18

A simple way to modeling the probubility distribution p(e|f) is to apply Bayes Theorem, which is:

probability of seeing sentence « in the target: langnage Therefore,

translation @ is executed by maximizing the product rel f)pfei:

argrnazpte|ƒ) = argruaxg(J|e)pz)

Tn order to perform the search efficiently in the hnge search space £*, machine

translation decoder trade-off the quality and time usage by using the foreign string, heuristics and other methods to limit the search space Some efficient searching

algorithms, which are currently used in the decoder, are Viterbi team, A* stack,

Graph Model, ete SM'L has been usecl as the core of systems by Google ‘lranslate

aul Bing ‘Lranslator

Example-based

To un Exunplo-bused machine Lrauslation (EBMT) system, a seutent

by using the ideu of unulogy In this approach, the corpus Uhut ix uscd is large of

Neural Machine Translation

Neural Machine Trunslation (NMT) is the uewest upproach lo MT and bused on the

model of :axchine learning This approach uses a large ur

Trang 19

2.1 Machine ‘Translation 7

of SM'L models that uses vector representations (“embedding”, “continuous space representations”) for words and internal states ‘I'he NMT contains a single sequence model to predict one word at a time ‘here is no separate translation model, lunguage model, reordering model, The first NMT inodcls ure using a recurrent neural uetwork (RNN), which uscy a bidirectional RNN known us un cneoder, bo encode the suuree seneneo and a sccond RNN, known ax a decoder, to predict words

in the target: language NMT systems can continnonsly learn and he adjusted to generate the best output and require a lot of computing power This is why these models have only been developed strongly in recent years

our work, we usc BLEU for automatic cvaluating our MT system configurations

‘he BLEU n-gram precision p, are computed by summing the n-gram matches for

all the candidate sentences in the test corpus C:

_ Lee iidates} Longramec COU matchedlvegrari)

Trang 20

In order to stimulate the development of the M'T research community, a variety of

free and complete Wwolkits lor MT are provided With the statistical (or daba-driven

approach to MT, we cam cousider same gystems uy follows:

@ Moses!: a complete SM'T' system

UCAM-SMT*: the Cambridge SMT system

Joshue*: » decoder [or syntux-bused SMT

Pharanh”: a decoder for TRV Madel 4

Besides, becuse of the superiority of NMT over SMT, NMT us reevived much attoution [ron rescurchers und compan

The following stur-ofthe-art NMT sys

tems are tatally free and easy to setup’

OpeuNMT®: o uylem is desigued 16 bu simple lo use und casy to extend developed by Harvard university und SYSTRAN,

Trang 21

@ Facebook-fairseq®: a system is implemented with Convolutional Neural Net: work (CNS), which can achieve a similar performance as the RNN-based NM'L' while running nine times faster developed by Facebook Al Research

are developed by Amazon

In this part, we introduce two MT systems, which are used in our work The first system is Moses - an open system for SMT and the remaining system is OpenNMT

- an open system for NMT

2.1.4.1 Moses - an Open Stati

ical Machine Translation System Moses, which was introdnead by (Koehn et al., 2007), is a complete open source toolkit for statistical machine translation It can automatically train translation inodeis for any language pair from a collection of translated sentences (parallel data)

as GIZA—— in (Och and Ney, 2003), MGIZA++ ‘These word alignments are then used to extract phrase translation pairs or hierarchical ruies ‘hese phrase pairs or rules are then scored by using corpuswide statistics Finally, welghts of different

slalisbical medels are tuned lo gencrave the best possible translations, MERT in

(Och, 2003) iy used to Lune weighls in Moses Inn the decoder process, Moses uses

the trained translation model to translate the source sentence into the target sen—

tence To overcome the huge search problem in decoding, Moses implements several different, algorithms for this search snch as stack-based, euhe-pruming, chart pars- ing etc Besides, an important part of the decoder is the language model, which is trained ftom the monolingual data in the target language to ensure the fluency of the output Moses supports many kinds of language model tools such as KENLM in (Heafield, 2011), SRILM in (Stoleke, 2002), IRSTLM in (Federico vt ul., 2008), ete

*uipe:} /github.com/facebookresear

“lupe: fgithubs.com/awslubs /sackey Jaieg

Trang 22

Currently, Moses supports several effective translation models such as phrase-based,

hierarchical phrase-besed, factored, syntax-hased and tree-based models

2.1.4.2 OpenNMT - an Open Neural Machine Translation System

OpenNMT is a full-featured deep learning system, which specialized in sequence: to-sequence models supporting a lot of tasks such as machine translation, summa rization, image to text, ote It is designed for complete training and deploying NMT models, The system has been rewritlen from sey2seqaltn developed at Har-

vard for ease of readability, elliciency, and generalizabilily It contains a variely of

easy-Lo-reuse aodules for stale-of-the-art performance such as encuders, decuders,

embedding layers, attention layers, input feeding, regularization, beam search, ete

Currently, OpenNMT has three main implementations:

for quick experiments and production

» OpeuNMT-tf: This implanentalion is » general purpose sequence modeling tool in TensorFlow focusing on lange-soale experiments and high-performance models

The structure of the Neural Machine Translation system in OpeuNMT is typically implemented us un cucoder-devoder architecture (Bahdanau ct wl, 2614) The

irectional recurrent neural

encoder is a recurrent neural network (RNN) or a bi

network that encodes a sonree sentence z — {m, ,a7,} into a seqnence af hidden states h.= {hy ,ha,}:

fe — Fonole(ae) fe 1) (24)

where A, is the hidden state at time step ¢, e{a,) is the embedding of x, T is

the number of symbols in the source sentence, and the fimetion fi is the recurrent

unit such as the gated recnerent unit (GRU) or the long short-term memory (STM) unit The decoder is also a recurrent neural network which is trained to predict the conditional probability of each symbol y% given its preceding symbols y<; and the context vector ce:

Pluel

Trang 23

2.2 Word Embedding 11

Te = faeole(ye), te 1,¢¢) (2.6)

where 7 ia the hidden state of the decoder at time step ¢ and updated by fare, e(ye)

is the embedding af target symbols yr, and g is a nonlmear function that, computes

the probability of yr In each decoding step, the context vector cr is computed by

summing the weight of source hidden states:

score(rs_1,h:) = vl tanh Wore + Wah) (2.9)

where 1%, W,, Wh are trainable parameters

In recent years, techniques using word embedding receive much interest from nat: ural language processing communities Word embedding is a vector tepresentation

of words which eouscrves scmmautie information and their contexts words in (Huang

ol ul, 2012) (Mikoloy ut al, 20130) (Mikoloy et al., 20130) Additionally, we can

exploit the advantage of enbeddiug to represont, words in diverse distinction spaces

cross-lingual word embedding

Trang 24

2.2.1 Monolingual Word Embedding Models

During the 1990s, vector space models have been applied for distributional seman- tics A variety of models are then developed for estimating continuous representations of words auch as Latent Dirichlet: Allocation (LTDA), Latent: Semantic Analysis

(LSA) ete The term word embeddings waa first used by (Rengin et al 2003),

who learned word representation by using a feed-forward neural network Recently, (Milolov et al., 2013a) proposecl new moclels for learning effectively distributed representation of words by using a feed forward neural network, known as word2vec They provided two ueural nebworks for learning word vectors: Coutinuous Skip-graut aud Continuous Bug-of- Words (CBOW) In CBOW, a focd-forward acural uetwork wilh an input layer, a projection layer, and an outpul layer is used Lo prediet the current word based context words as shown in Figure 2.1, In this architecture, the projection layer is cammon among all words, the input is a window of m fnture words and m history words of the current word All the input words are projected to a common space, and the current word is then predicted by averaging these input vec tors In contrast to CBOW, Skip-gram model uses the current word to predict the surrounding words as shown in Figure 2.1 ‘Lhe input of this model is a center word,

whieh is fed into the projection layer and the oubput is 2 “1 vectors for n history

and # future words In practice, in case of limited wonolingual data, Skip-gram

indicates a better word representation than CBOW, However, CBOW iy faster und

wy and ity coutexl word w; and the logarithm of ticir number of co-ovcurreners:

#@lave — 3” ƒ(Cụ)(0 0, — bị +ñy — logCu}? (10)

ám

where w; and 6, are the word vector and bias of word i, &j and 6) are the context

word vector and bias, Cj, captures the number of times word ¢ occurs in the context

of word j, and f is a weighting function that assigns relatively lower weight to rare

and Irequeat co-occurrence,

Trang 25

Figure 2.1; The CBOW model predicts the current word based on the context, and

the Skip-gram predicts surrounding words based on the current word

2.2.2 Cross-Lingual Word Embedding Models

Cross-lingual word embeddings models learn the cross-lingual representation of words

in a joint embedding space to represent meaning and transfer knowledge in cross- lingual applications Recently, many models for learning cross-lingual embeddings

have been proposed as shown in (Ruder et al., 2017) - a survey of cross-lingual word embedding models In this section, we introduce three models in (Mikolov et al., 2013b), (Xing et al., 2015) and (Connean et al., 2017), which are used in our experiments to enhance the quality of MT system In the models, they always assume that they have two sets of embeddings trained independently on monolingual data

Their work focuses on learning a mapping between two sets such that translations

are close in the shared space

Cross-lingual embedding model in (Mikoloy et al., 2013b)

(Mikolov et al., 2013b) show that they can exploit the similarities of monolingual embedding space by learning a linear projection between vector spaces representing

each language They first build vector representation models of languages using large amounts of monolingual data Next, they use a small bilingual dictionary to

learn a linear projection between the languages For this purpose, they use a dictio-

nary of n = 5000 word-pairs {.r,, z}ic(i,n) to find a transformation matrix W such

that W2, approximates 2; In practice, learning the transformation matrix W can

Trang 26

be considered as an optimization problem and it can be solved by minimizing the

following error function using a gradient descent method:

i 2

At the test time, given new word and its continuous vector representation x, they

can map it to the other language space by computing z = Wz the word whose

representation is closest to z in the target language space is then retrieved by using

cosine similarity as the distance metric

Cross-lingual embedding model in (Xing et al., 2015)

013b), (Xing et al., 2015) pointed that the

Euler distance in the objective function as shown in Equation 2.11 is fundamentally

considered as the Pros

olved this problem

enforcing an orthogonality constraint on W And then, the Equation 2.11 is

s problem in (Schénemann, 1966), which provided a solution obtained from the singular value decomposition (SVD) of ZX, where X and Z are two matrices of size d x n containing the embeddings of the words in the bilingual dictionary ‘The formula is shown as follows:

We = argiin||IVX — 2)? = UV" with UZV? = SVD(ZX") (2.12)

Cross-lingual embedding model in (Conneau et al., 2017)

The two above models reported good performance on the word translation task by using a small bilingual dictionary to learn the linear mapping In this model, the

authors show how to learn the mapping W without using any bilingual data, their

Trang 27

2.2 Word Embedding d5

the illustration, (A) shows two set of pre trained word embeddings, English words dencted by X and Italian words denoted by Y Kach dot indicates a werd in that space ‘I'he size of the dot is proportional to the frequency of the words in the train ing corpus of Ubut language (B) introduces a method te learn uxt initial proxy of W

Ủy usiuy an adversarial eriterion, The sian are randondly sclevied words tlt are fed

Wy bie diseriutinalor (C) proscuts usiug che best-maldied words us anchor points

to refine the mapping W via Procrustes (1D) changes the metric of the space to improve performance aver less frequent words, The detail of this model is described

as follows

For leaming W’ without using bilingual dats, authors use domain-acversarial ap proach Let X = {24

cunbeddings of n source and targcl language respectively A model culled discriui-

ep} and Y = fyn, %m} be two sets of n and im word

motor is trained to diseriminute between clanents raudonly sumpled fron WX usd

¥ W iy trained to prevent the diserimimtor frou uecurutely predicing the Dis

criminator loss is shown as below:

#p(8p|W)— 2+ logPs, (source — LDV) ¬>» lo, (souree — 0g)

the mapping W is trained as the following function:

Ly 6p} = xà ee {souree = 0|W#;} mee (source = 1)

a morc accurate dictionary

To increase the sisnilurity exsocialod with isolated word veetors aud decreuse the oucs

of veclors being in dense arcus, d sùinilarity meusure neared Cross-domuain sinailucily local scaling (CST.S) is propased This measnre computes the similarity hetween

Tiêu đề	Enhancing the Quality of Machine Translation System Using Cross-Lingual Word Embedding Models
Tác giả	Nguyen Minh Thuan
Người hướng dẫn	Assoc. Prof. Nguyen Phuong Thai
Trường học	Vietnam National University, Hanoi
Chuyên ngành	Computer Science
Thể loại	Thesis
Năm xuất bản	2018
Thành phố	Hanoi

Định dạng
Số trang	55
Dung lượng	834,25 KB