unsupervised language model adaptation using latent semantic marginals

Tài liệu Báo cáo khoa học: "A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation" doc

Tài liệu Báo cáo khoa học: "A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation" doc

... 5-gram/2-SLM+2-gram/4-SLM+5- gram/PLSA language model improves both signif- icantly. Bear in mind that Charniak et al. (2003) in- tegrated Charniak’s language model with the syntax- based translation model Yamada and ... semantic con- tent g k+1 . The parameter for WORD-PREDICTOR in the composite n-gram/m-SLM/PLSA language model becomes p (w k+1 |w k k−n+2 h −1 −m g k+1 ). The re- sulting composite language model ... Large language models in ma- chine translation. The 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP), 858-867. E. Charniak. 2001. Immediate-head parsing for language models....

Ngày tải lên: 20/02/2014, 04:20

10 569 0
Tài liệu Báo cáo khoa học: "Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Model " pptx

Tài liệu Báo cáo khoa học: "Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Model " pptx

... probability and the normalizing constant ~(~) are used to compute P(~I~) using equation (7). 3 Language Model 3.1 Word Segmentation Model Let the input Japanese character sequence be C = clc2 cm, ... 8"3"4"16 1536. By using 2-stage feature selection, it can be reduced to 256, while still preserving the original recognition ability. 924 Using the language model (9), the OCR error ... likely correction candidate is selected by the word segmentation algo- rithm using the OCR model and the language model. For simplicity, we will present the method as if it were for an isolated...

Ngày tải lên: 20/02/2014, 18:20

7 472 0
Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc

... translation model adaptation in detail. 3.1 Hidden Topic Markov Model During the last couple of years, topic models such as Probabilistic Latent Semantic Analysis (Hof- mann, 1999) and Latent Dirichlet ... ignored, these methods model the text corpus by using a co-occurrence matrix of words and documents, and build generative model- s to infer the latent aspects or topics. Using these models, the words ... weblog. According to adaptation emphases, domain adap- tation in SMT can be classified into translation mod- el adaptation and language model adaptation. Here we focus on how to adapt a translation model, which is...

Ngày tải lên: 19/02/2014, 19:20

10 533 0
Tài liệu Báo cáo khoa học: "Topic Models for Dynamic Translation Model Adaptation" pptx

Tài liệu Báo cáo khoa học: "Topic Models for Dynamic Translation Model Adaptation" pptx

... number of latent topics in each model to be 5, 10, and 20. On FBIS, we can see that both models achieve moderate but consistent gains over the baseline on both BLEU and TER. The best model, LTM-10, ... topic-specific contexts, where topics are induced in an unsupervised way using topic models; this can be thought of as inducing subcorpora for adaptation with- out any human annotation. We use these ... underlying latent topics of the documents (Blei et al., 2003). Topic modeling has received some use in SMT, for in- stance Bilingual LSA adaptation (Tam et al., 2007), and the BiTAM model (Zhao...

Ngày tải lên: 19/02/2014, 19:20

5 532 0
Tài liệu Báo cáo khoa học: "Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts" docx

Tài liệu Báo cáo khoa học: "Modeling Wisdom of Crowds Using Latent Mixture of Discriminative Experts" docx

... representation of CRF model. CRF Mixture of Experts To show the importance of latent variable in our Wisdom-LMDE model, we trained a CRF-based mixture of discriminative ex- perts. This model is similar ... interactions. Model pa- rameters were validated by using a 3-fold cross- validation strategy on the training set. Regulariza- 338 Table 2: Comparison of our Wisdom-LMDE model with previously proposed models. ... approach is based on the Latent Mixture of Discriminative Experts (LMDE) model originally introduced for multimodal fu- sion (Ozkan et al., 2010). In our Wisdom-LMDE model, a discriminative expert...

Ngày tải lên: 20/02/2014, 05:20

6 347 0
Tài liệu Báo cáo khoa học: "Smoothing a Tera-word Language Model" doc

Tài liệu Báo cáo khoa học: "Smoothing a Tera-word Language Model" doc

... Goodman. 2001. A bit of progress in language modeling. Computer Speech and Language. R. Kneser and H. Ney. 1995. Improved backing-off for m-gram language modeling. In International Confer- ence ... chosen for each n-gram order. Using this formulation as an interpolated 5- gram language model gives a cross entropy of 8.05 bits on Brown. 4.5 Dirichlet with KN Back-Off Using a modified back-off ... Bauman Peto. 1995. A hierarchical Dirichlet language model. Natural Lan- guage Engineering, 1(3):1–19. Y.W. Teh. 2006. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceed- ings...

Ngày tải lên: 20/02/2014, 09:20

4 426 1
Tài liệu Báo cáo khoa học: "A Succinct N-gram Language Model" ppt

Tài liệu Báo cáo khoa học: "A Succinct N-gram Language Model" ppt

... By using 8-bit floating point quantization 1 , N -gram language models are com- pressed into 10 GB, which is comparable to a lossy representation (Talbot and Brants, 2008). 2 N -gram Language Model We ... language model structure and word iden- tifiers. In Proc. of ICASSP 2003, volume 1. A. Stolcke. 1998. Entropy-based pruning of backoff language models. In Proc. of the ARPA Workshop on Human Language ... GB Succinct 12.62 GB 10.57 GB Language Trie 42.65 GB 20.01 GB model Integer 33.65 GB 18.98 GB Succinct 31.67 GB 18.37 GB Quantized Trie 24.73 GB 11.47 GB language Integer 15.73 GB 10.44 GB model Succinct 13.75...

Ngày tải lên: 20/02/2014, 09:20

4 460 0
Tài liệu Báo cáo khoa học: "A Phonotactic Language Model for Spoken Language Identification" pptx

Tài liệu Báo cáo khoa học: "A Phonotactic Language Model for Spoken Language Identification" pptx

... statistical language modeling, and language identification. A typical LID system is illustrated in Figure 1 (Zissman, 1996), where language dependent voice tokenizers (VT) and lan- guage models ... Bellegarda. 2000. Exploiting latent semantic information in statistical language modeling , In Proc. of the IEEE, 88(8):1279-1296. M. W. Berry, S.T. Dumais and G.W. O’Brien. 1995. Using Linear Algebra ... Tokenization at different resolutions 2.2 n-gram Language Model With the sequence of tokens, we are able to es- timate an n-gram language model (LM) from the statistics. It is generally agreed...

Ngày tải lên: 20/02/2014, 15:20

8 438 0
Tài liệu Báo cáo khoa học: "Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis" ppt

Tài liệu Báo cáo khoa học: "Improving Probabilistic Latent Semantic Analysis with Principal Component Analysis" ppt

... LSA-PLSA with 3 PLSA models. We also compared these models against the performance of an averaged model without an LSA-PLSA model: 1 LSA and 1 P LSA model. In each case, the PL SA models were randomly ... latent classes for the CISI collection using multiple models. In all class sizes, a combined model that included the LSA- initialized PLSA model had performance that was at least as good as using ... the latent class . As in the analysis above, we assume that the latent classes in the LSA model correspond to the latent classes of the PLSA model. Making the simplify- ing assumption that the latent...

Ngày tải lên: 22/02/2014, 02:20

8 589 1
Tài liệu Báo cáo khoa học: "A Structured Language Model" ppt

Tài liệu Báo cáo khoa học: "A Structured Language Model" ppt

... Parsing using a Hid- den Derivational Model. In Proceedings of the Human Language Technology Workshop, 272-277. ARPA. Raymond Lau, Ronald Rosenfeld, and Salim Roukos. 1993. Trigger-based language ... predict a word wk and they were implemented using the Maximum Entropy Model- ing Toolkit 1 (Ristad97). The constraint templates in the {W,H} models were: 4 <= <*>_<*> <7>; ... the difference being that in the present work triggers are identified using the syntactic structure. 3.2 The second model Model (2) assigns probability to different binary parses of the word...

Ngày tải lên: 22/02/2014, 03:20

3 343 0
Báo cáo khoa học: "Intelligent Selection of Language Model Training Data" ppt

Báo cáo khoa học: "Intelligent Selection of Language Model Training Data" ppt

... domain- specific and non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, ... to each language model a separate feature function in the overall translation model. The normal practice when using multiple lan- guages models in machine translation seems to be to train models ... Chien, Ker-Jiann Chen, and Lin-Shan Lee. 1997. Chinese language model adaptation based on document classification and multiple domain- specific language models. In EUROSPEECH- 1997, 1463–1466. Hermann...

Ngày tải lên: 07/03/2014, 22:20

5 350 0
Báo cáo khoa học: "Optimizing Language Model Information Retrieval System with Expectation Maximization Algorithm" doc

Báo cáo khoa học: "Optimizing Language Model Information Retrieval System with Expectation Maximization Algorithm" doc

... its original parame- ters are given by the basic language modeling approach calculation. Figure 2. HMM model for EM IR We define our HMM model as a four-tuple, {S,A,B,π}, where S is a ... calculated with the simple language modeling approach. Even if the query term is not in the document, it will be assigned a small value according to the basic language modeling method. The rest ... states take much time for one run, we firstly apply a basic language modeling method to reduce our docu- ment set. This language modeling method will be detailed at Section 3.1. Based on the...

Ngày tải lên: 08/03/2014, 01:20

9 318 1
w