Tài liệu Báo cáo khoa học: "Towards History-based Grammars: Using Richer Models for Probabilistic Parsing*" docx

We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct pa

Trang 1

Towards History-based Grammars:

Using Richer Models for Probabilistic Parsing*

Ezra Black Fred Jelinek J o h n Lafferty D a v i d M M a g e r m a n

R o b e r t M e r c e r Salim R o u k o s IBM T J Watson Research Center

A b s t r a c t

We describe a generative probabilistic model of

natural language, which we call HBG, that takes

advantage of detailed linguistic information to re-

solve ambiguity HBG incorporates lexical, syn-

tactic, semantic, and structural information from

the parse tree into the disambiguation process in a

novel way We use a corpus of bracketed sentences,

called a Treebank, in combination with decision

tree building to tease out the relevant aspects of a

parse tree that will determine the correct parse of

a sentence This stands in contrast to the usual ap-

proach of further grammar tailoring via the usual

linguistic introspection in the hope of generating

the correct parse In head-to-head tests against

one of the best existing robust probabilistic pars-

ing models, which we call P-CFG, the HBG model

significantly outperforms P-CFG, increasing the

parsing accuracy rate from 60% to 75%, a 37%

reduction in error

I n t r o d u c t i o n Almost any natural language sentence is ambigu-

ous in structure, reference, or nuance of mean-

ing Humans overcome these apparent ambigu-

ities by examining the contez~ of the sentence

But what exactly is context? Frequently, the cor-

rect interpretation is apparent from the words or

constituents immediately surrounding the phrase

in question This observation begs the following

question: How much information about the con-

text of a sentence or phrase is necessary and suffi-

cient to determine its meaning? This question is at

the crux of the debate among computational lin-

guists about the application and implementation

of statistical methods in natural language under-

standing

Previous work on disambiguation and proba-

bilistic parsing has offered partial answers to this

question Hidden Markov models of words and

*Thanks to Philip Resnik and Stanley Chen for

their valued input

their tags, introduced in (5) and (5) and pop- ularized in the natural language community by Church (5), demonstrate the power of short-term n-gram statistics to deal with lexical ambiguity Hindle and Rooth (5) use a statistical measure

of lexical associations to resolve structural ambiguities Brent (5) acquires likely verb subcat- egorization patterns using the frequencies of verb- object-preposition triples Magerman and Mar- cus (5) propose a model of context that combines the n-gram model with information from dominat- ing constituents All of these aspects of context are necessary for disambiguation, yet none is suf- ficient

We propose a probabilistic model of context for disambiguation in parsing, HBG, which incorporates the intuitions of these previous works into one unified framework Let p(T, w~) be the joint probability of generating the word string w~ and the parse tree T Given w~, our parser chooses as its parse tree that tree T* for which

T" = a r g maxp(T, w~) (1) T6~(~)

where ~(w~) is the set of all parses produced by the grammar for the sentence w~ Many aspects of the input sentence that might be relevant to the decision-making process participate in the probabilistic model, providing a very rich if not the richest model of context ever attempted in a probabilistic parsing model

In this paper, we will motivate and define the HBG model, describe the task domain, give an overview of the grammar, describe the proposed HBG model, and present the results of experiments comparing HBG with an existing state-of- the-art model

M o t i v a t i o n for H i s t o r y - b a s e d

G r a m m a r s

One goal of a parser is to produce a grammatical interpretation of a sentence which represents the

Trang 2

syntactic and semantic intent of the sentence To

achieve this goal, the parser must have a mecha-

nism for estimating the coherence of an interpreta-

tion, both in isolation and in context Probabilis-

tic language models provide such a mechanism

A probabilistic language model attempts

to estimate the probability of a sequence

of sentences and their respective interpreta-

tions (parse trees) occurring in the language,

:P(SI TI S2 T2 S,, T,~)

The difficulty in applying probabilistic mod-

els to natural language is deciding what aspects

of the sentence and the discourse are relevant to

the model Most previous probabilistic models of

parsing assume the probabilities of sentences in a

discourse are independent of other sentences In

fact, previous works have made much stronger in-

dependence assumptions The P-CFG model con-

siders the probability of each constituent rule in-

dependent of all other constituents in the sen-

tence The :Pearl (5) model includes a slightly

richer model of context, allowing the probability

of a constituent rule to depend upon the immedi-

ate parent of the rule and a part-of-speech trigram

from the input sentence But none of these mod-

els come close to incorporating enough context to

disambiguate many cases of ambiguity

A significant reason researchers have limited

the contextual information used by their mod-

els is because of the difficulty in estimating very

rich probabilistic models of context In this work,

we present a model, the history-based grammar

model, which incorporates a very rich model of

context, and we describe a technique for estimat-

ing the parameters for this model using decision

trees The history-based grammar model provides

a mechanism for taking advantage of contextual

information from anywhere in the discourse his-

tory Using decision tree technology, any question

which can be asked of the history (i.e Is the sub-

ject of the previous sentence animate? Was the

previous sentence a question? etc.) can be incor-

porated into the language model

T h e H i s t o r y - b a s e d G r a m m a r M o d e l

The history-based grammar model defines context

of a parse tree in terms of the leftmost derivation

of the tree

Following (5), we show in Figure 1 a context-

free grammar (CFG) for a'~b "~ and the parse tree

for the sentence aabb The leftmost derivation of

the tree T in Figure 1 is:

"P1 'r2 'P3

S ~ A S B * a S B ~ a A B B ~-~ a a B B ~-h a a b B Y-~

(2)

where the rule used to expand the i-th node of

the tree is denoted by ri Note that we have in-

aabb

S -, A S B I A B

A -, a

B ~ b

/ ".,

4-5.:

Figure h Grammar and parse tree for aabb

dexed the non-terminal (NT) nodes of the tree with this leftmost order We denote by ~- the sentential form obtained just before we expand node

i Hence, t~ corresponds to the sentential form

a S B or equivalently to the string rlr2 In a leftmost derivation we produce the words in left-to- right order

Using the one-to-one correspondence between leftmost derivations and parse trees, we can rewrite the joint probability in (1) as:

~r~

p ( T , w~) = H p(r, ]t[)

i = 1

In a probabilistic context-free grammar (P-CFG), the probability of an expansion at node i depends only on the identity of the non-terminal Ni, i.e.,

p(r lq) = T h u s

v(T, = I I

i 1

So in P-CFG the derivation order does not affect the probabilistic model 1

A less crude approximation than the usual P- CFG is to use a decision tree to determine which aspects of the leftmost derivation have a bear- ing on the probability of how node i will be ex- panded In other words, the probability distribu- tion p(ri ]t~) will be modeled by p ( r i [ E [ t ~ ] ) where E[t] is the equivalence class of the history ~ as determined by the decision tree This allows our 1Note the abuse of notation since we denote by p(ri) the conditional probability of rewriting the non- terminal AT/

Trang 3

probabilistic model to use any i n f o r m a t i o n any-

where in the partial derivation tree to determine

the p r o b a b i l i t y of different expansions of the i-th

non-terminal T h e use of decision trees and a large

bracketed corpus m a y shift some of the burden of

identifying the intended parse f r o m the g r a m m a r -

ian to the statistical e s t i m a t i o n methods We refer

to probabilistic m e t h o d s based on the derivation

as History-based G r a m m a r s (HBG)

In this paper, we explored a restricted imple-

m e n t a t i o n of this model in which only the p a t h

f r o m the current node to the root of the deriva-

tion along with the index of a branch (index of

the child of a parent ) are examined in the decision

tree model to build equivalence classes of histories

Other p a r t s of the subtree are not examined in the

i m p l e m e n t a t i o n of H B G

[N It_PPH1 N]

IV indicates_VVZ [Fn [ F n ~ w h e t h e r _ C S W

[N a_AT1 call_NN1 N]

[V completed_VVD successfully_RR V]Fn&] or_CC

[ F n + i L C S W [N some_DD error_NN1 N]@

[V was_VBDZ detected_VVN V]

@[Fr t h a t _ C S T [V caused_VVD

IN the_AT call_NN1 N]

[Ti to_TO fail_VVI Wi]V]Fr]Fn+]

Fn]V]._

Figure 2: Sample bracketed sentence f r o m Lan- caster Treebank

T a s k D o m a i n

We have chosen c o m p u t e r m a n u a l s as a task do-

m a i n We picked the m o s t frequent 3000 words

in a corpus of 600,000 words f r o m 10 m a n u a l s as

our vocabulary We then extracted a few mil-

lion words of sentences t h a t are completely cov-

ered by this v o c a b u l a r y f r o m 40,000,000 words of

c o m p u t e r manuals A r a n d o m l y chosen sentence

f r o m a sample of 5000 sentences f r o m this corpus

is:

3 9 6 It indicates whether a call completed suc-

cessfully or if some error was detected t h a t

caused the call to fail

To define w h a t we m e a n by a correct parse,

we use a corpus of m a n u a l l y bracketed sentences

at the University of Lancaster called the Tree-

bank T h e T r e e b a n k uses 17 non-terminal labels

and 240 tags T h e bracketing of the above sen-

tence is shown in Figure 2

A parse produced by the g r a m m a r is judged

to be correct if it agrees with the Treebank parse

structurally and the N T labels agree T h e g r a m -

m a r has a significantly richer N T label set (more

t h a n 10000) t h a n the Treebank b u t we have de-

fined an equivalence m a p p i n g between the g r a m -

m a r N T labels a n d the Treebank N T labels In

this paper, we do not include the tags in the mea-

sure of a correct parse

We have used a b o u t 25,000 sentences to help

the g r a m m a r i a n develop the g r a m m a r with the

goal t h a t the correct (as defined above) parse is

a m o n g the proposed (by the g r a m m a r ) parses for

sentence Our m o s t c o m m o n test set consists of

1600 sentences t h a t are never seen by the g r a m -

m a r i a n

T h e G r a m m a r

T h e g r a m m a r used in this e x p e r i m e n t is a broad- coverage, feature-based unification g r a m m a r T h e

g r a m m a r is context-free b u t uses unification to ex- press rule t e m p l a t e s for the the context-free productions For example, the rule t e m p l a t e :

(3)

corresponds to three C F G productions where the second feature : n is either s, p, or : n T h i s rule

t e m p l a t e m a y elicit up to 7 non-terminals T h e

g r a m m a r has 21 features whose range of values

m a y b e f r o m 2 to a b o u t 100 with a m e d i a n of 8 There are 672 rule t e m p l a t e s of which 400 are actually exercised when we parse a corpus of 15,000 sentences T h e n u m b e r of productions t h a t are realized in this training corpus is several hundred thousand

P - C F G While a N T in the above g r a m m a r is a feature vector, we group several N T s into one class we call

a m n e m o n i c represented by the one N T t h a t is the least specified in t h a t class For example, the

m n e m o n i c VBOPASTSG* corresponds to all N T s

t h a t unify with:

p o s - - v 1

v - - ~ y p e = b e (4)

t e n s e - a s p e c t : p a s t

We use these m n e m o n i c s to label a parse tree and we also use t h e m to e s t i m a t e a P - C F G , where the probability of rewriting a N T is given by the probability of rewriting the m n e m o n i c So f r o m

a training set we induce a C F G f r o m the a c t u a l

m n e m o n i c productions t h a t are elicited in parsing the training corpus Using the Inside-Outside

Trang 4

a l g o r i t h m , we can e s t i m a t e P - C F G f r o m a large

corpus of text But since we also have a large

corpus of bracketed sentences, we can a d a p t the

Inside-Outside a l g o r i t h m to reestimate the prob-

ability p a r a m e t e r s subject to the constraint t h a t

only parses consistent with the Treebank (where

consistency is as defined earlier) contribute to the

reestimation F r o m a training run of 15,000 sen-

tences we observed 87,704 m n e m o n i c productions,

with 23,341 N T m n e m o n i c s of which 10,302 were

lexical R u n n i n g on a test set of 760 sentences 32%

of the rule t e m p l a t e s were used, 7% of the lexi-

cal mnemonics, 10% of the constituent m n e m o n -

ics, a n d 5% of the m n e m o n i c productions actually

c o n t r i b u t e d to parses of test sentences

G r a m m a r a n d M o d e l P e r f o r m a n c e

M e t r i c s

To evaluate the p e r f o r m a n c e of a g r a m m a r and an

a c c o m p a n y i n g model, we use two types of m e a -

surements:

• the any-consistent rate, defined as the percent-

age of sentences for which the correct parse is

proposed a m o n g the m a n y parses t h a t the g r a m -

m a r provides for a sentence We also measure

the parse base, which is defined as the geomet-

ric m e a n of the n u m b e r of proposed parses on a

per word basis, to quantify the a m b i g u i t y of the

g r a m m a r

• the Viterbi rate defined as the percentage of sen-

tences for which the m o s t likely parse is consis-

tent

T h e any-contsistentt rate is a m e a s u r e of the g r a m -

m a r ' s coverage of linguistic p h e n o m e n a T h e

Viterbi rate evaluates the g r a m m a r ' s coverage

with the statistical model i m p o s e d on the g r a m -

m a r T h e goal of probabilistic modelling is to pro-

duce a Viterbi r a t e close to the anty-contsistentt rate

T h e any-consistent rate is 90% when we re-

quire the structure a n d the labels to agree and

96% when unlabeled bracketing is required These

results are o b t a i n e d on 760 sentences f r o m 7 to 17

words long f r o m test m a t e r i a l t h a t has never been

seen by the g r a m m a r i a n T h e parse base is 1.35

p a r s e s / w o r d T h i s translates to a b o u t 23 parses

for a 12-word sentence T h e unlabeled Viterbi rate

stands at 64% a n d the labeled Viterbi rate is 60%

While we believe t h a t the above Viterbi rate

is close if not the state-of-the-art performance,

there is r o o m for i m p r o v e m e n t by using a more re-

fined statistical model to achieve the labeled any-

contsistent rate of 90% with this g r a m m a r There

is a significant g a p between the labeled Viterbiand

any-consistent rates: 30 percentage points

I n s t e a d of the usual a p p r o a c h where a g r a m -

m a r i a n tries to fine tune the g r a m m a r in the hope

of improving the Viterbi rate we use the c o m b i n a - tion of a large T r e e b a n k a n d the resulting derivation histories with a decision tree building algo-

r i t h m to extract statistical p a r a m e t e r s t h a t would improve the Viterbi rate T h e g r a m m a r i a n ' s task remains t h a t of i m p r o v i n g the any-consistent rate

T h e history-based g r a m m a r m o d e l is distin- guished f r o m the context-free g r a m m a r m o d e l in

t h a t each constituent structure depends not only

on the input string, b u t also the entire history up

to t h a t point in the sentence In H B G s , history

is interpreted as a n y element of the o u t p u t structure, or the parse tree, which has already been determined, including previous words, n o n - t e r m i n a l categories, constituent structure, a n d a n y other linguistic i n f o r m a t i o n which is generated as p a r t

of the parse structure

T h e H B G M o d e l

Unlike P - C F G which assigns a p r o b a b i l i t y to a

m n e m o n i c production, the H B G m o d e l assigns a probability to a rule t e m p l a t e Because of this the

H B G f o r m u l a t i o n allows one to handle a n y g r a m -

m a r f o r m a l i s m t h a t has a derivation process For the H B G model, we have defined a b o u t

50 syntactic categories, referred to as Syn, a n d

a b o u t 50 semantic categories, referred to as Sere

Each N T (and therefore m n e m o n i c ) of the g r a m -

m a r has been assigned a syntactic (Syn) a n d a semantic (Sem) category We also associate with

a n o n - t e r m i n a l a p r i m a r y lexical head, denoted by H1, a n d a secondary lexical head, denoted by H~ 2

W h e n a rule is applied to a n o n - t e r m i n a l , it indicates which child will generate the lexical p r i m a r y head and which child will generate the secondary lexical head

T h e proposed generative m o d e l associates for each constituent in the parse tree the probability:

p( Syn, Sern, R, H1, H2

[Synp, Setup, P~, Ipc, Hip, H2p )

In H B G , we predict the syntactic a n d s e m a n - tic labels of a constituent, its rewrite rule, a n d its two lexical heads using the labels of the parent constituent, the p a r e n t ' s lexical heads, the parent's rule P~ t h a t lead to the constituent a n d the constituent's index Ipc as a child of R~ As

we discuss in a later section, we have also used with success m o r e i n f o r m a t i o n a b o u t the derivation tree t h a n the i m m e d i a t e p a r e n t in condition- ing the p r o b a b i l i t y of e x p a n d i n g a constituent 2The primary lexical head H1 corresponds (roughly) to the linguistic notion of a lexicai head The secondary lexical head H2 has no linguistic par- allel It merely represents a word in the constituent besides the head which contains predictive information about the constituent

Trang 5

We have a p p r o x i m a t e d the above probability

by the following five factors:

1 p(Syn IP~, X~o, X ~ , Sy~, Se.~)

2 p( Sern ISyn, Rv, /pc, Hip, H2p, Synp, Sern; )

3 p( R ]Syn, Sem, 1~, Ipc, Hip, H2p, Synp, Semi)

4 p(H IR, Sw, Sere, I o,

5 p(n2 IH1,1< Sy , Sere, Ipc, Sy, p)

While a different order for these predictions is pos-

sible, we only experimented with this one

P a r a m e t e r E s t i m a t i o n

We only have built a decision tree to the rule prob-

ability component (3) of the model For the mo-

ment, we are using n-gram models with the usual

deleted interpolation for smoothing for the other

four components of the model

We have assigned bit strings to the syntactic

and semantic categories and to the rules manually

Our intention is that bit strings differing in the

least significant bit positions correspond to cate-

gories of non-terminals or rules t h a t are similar

We also have assigned bitstrings for the words in

the vocabulary (the lexical heads) using a u t o m a t i c

clustering algorithms using the bigram mutual in-

formation clustering algorithm (see (5)) Given

the bitsting of a history, we then designed a deci-

sion tree for modeling the probability that a rule

will be used for rewriting a node in the parse tree

Since the g r a m m a r produces parses which m a y

be more detailed t h a n the Treebank, the decision

tree was built using a training set constructed in

the following manner Using the g r a m m a r with

the P - C F G model we determined the most likely

parse t h a t is consistent with the Treebank and

considered the resulting sentence-tree pair as an

event Note t h a t the g r a m m a r parse will also pro-

vide the lexical head structure of the parse Then,

we extracted using leftmost derivation order tu-

pies of a history (truncated to the definition of a

history in the HBG model) and the corresponding

rule used in expanding a node Using the resulting

d a t a set we built a decision tree by classifying his-

tories to locally minimize the entropy of the rule

template

W i t h a training set of about 9000 sentence-

tree pairs, we had a b o u t 240,000 tuples and we

grew a tree with a b o u t 40,000 nodes This re-

quired 18 hours on a 25 MIPS RISC-based ma-

chine and the resulting decision tree was nearly

100 megabytes

I m m e d i a t e v s F u n c t i o n a l P a r e n t s

T h e HBG model employs two types of parents, the

immediate parent and the functional parent T h e

w i t h

R: P P I

S y n : P P

S e m : W i t h - D a t a

H I : l i s t }{2 : w i t h

S e m : D a t a

H I : l i s t

H 2 : a

S y n :

H I :

H 2 :

N

D a t a

l i s t

I

l i s t

Figure 3: Sample representation of "with a l i s t "

in HBG model

Trang 6

immediate parent is the constituent that immedi-

ately dominates the constituent being predicted

If the immediate parent of a constituent has a dif-

ferent syntactic type from that of the constituent,

then the immediate parent is also the functional

parent; otherwise, the functional parent is the

functional parent of the immediate parent The

distinction between functional parents and imme-

diate parents arises primarily to cope with unit

productions When unit productions of the form

XP2 ~ XP1 occur, the immediate parent of XP1

is XP2 But, in general, the constituent XP2 does

not contain enough useful information for ambi-

guity resolution In particular, when considering

only immediate parents, unit rules such as NP2 *

NP1 prevent the probabilistic model from allow-

ing the NP1 constituent to interact with the VP

rule which is the functional parent of NP1

When the two parents are identical as it of-

ten happens, the duplicate information will be ig-

nored However, when they differ, the decision

tree will select that parental context which best

resolves ambiguities

Figure 3 shows an example of the represen-

tation of a history in HBG for the prepositional

phrase "with a list." In this example, the imme-

diate parent of the N1 node is the NBAR4 node

and the functional parent of N1 is the PP1 node

R e s u l t s

We compared the performance of HBG to the

"broad-coverage" probabilistic context-free gram-

mar, P-CFG The any-consistent rate of the gram-

mar is 90% on test sentences of 7 to 17 words The

Vi$erbi rate of P-CFG is 60% on the same test cor-

pus of 760 sentences used in our experiments On

the same test sentences, the HBG model has a

Viterbi rate of 75% This is a reduction of 37% in

error rate

Accuracy P-CFG 59.8%

Error Reduction 36.8%

Figure 4: Parsing accuracy: P-CFG vs HBG

In developing HBG, we experimented with

similar models of varying complexity One discov-

ery made during this experimentation is that mod-

els which incorporated more context than HBG

performed slightly worse than HBG This suggests

that the current training corpus may not contain

enough sentences to estimate richer models Based

on the results of these experiments, it appears

likely that significantly increasing the sise of the

training corpus should result in a corresponding improvement in the accuracy of HBG and richer HBG-like models

To check the value of the above detailed history, we tried the simpler model:

1 p(H1 [HI~, H ~ , P~, Z~o)

2 p(H2 [H~, H~p, H2p, 1%, Ip~)

3 p(syn IH ,

4 v(Sem ISYn, H,, Ip,)

5 p(R [Syn, Sere, H~, H2)

This model corresponds to a P-CFG with NTs that are the crude syntax and semantic categories annotated with the lexical heads The Viterbi rate

in this case was 66%, a small improvement over the P-CFG model indicating the value of using more context from the derivation tree

C o n c l u s i o n s

The success of the HBG model encourages fu- ture development of general history-based grammars as a more promising approach than the usual P-CFG More experimentation is needed with a larger Treebank than was used in this study and with different aspects of the derivation history In addition, this paper illustrates a new approach to grammar development where the parsing problem

is divided (and hopefully conquered) into two sub- problems: one of grammar coverage for the gram- marian to address and the other of statistical modeling to increase the probability of picking the correct parse of a sentence

R E F E R E N C E S

Baker, J K., 1975 Stochastic Modeling for Au- tomatic Speech Understanding In Speech Recognition, edited by Raj Reddy, Academic Press, pp 521-542

Brent, M R 1991 Automatic Acquisition of Sub- categorization Frames from Untagged Free- text Corpora In Proceedings of the 29th An- nual Meeting of the Association for Computa- tional Linguistics Berkeley, California Brill, E., Magerman, D., Marcus, M., and San- torini, B 1990 Deducing Linguistic Structure from the Statistics of Large Corpora In Pro- ceedings of the June 1990 DARPA Speech and Natural Language Workshop Hidden Valley, Pennsylvania

Brown, P F., Della Pietra, V J., deSouza, P V., Lai, J C., and Mercer, R L Class-based n- gram Models of Natural Language In Pro- ceedings of ~he IBM Natural Language ITL,

March, 1990 Paris, France

Trang 7

Church, K 1988 A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text In Proceedings of the Second Conference on Ap- plied Natural Language Processing Austin, Texas

Gale, W A and Church, K 1990 Poor Estimates

of Context are Worse than None In Proceed- ings of the June 1990 DARPA Speech and Natural Language Workshop Hidden Valley, Pennsylvania

Harrison, M A 1978 Introduction to Formal Language Theory Addison-Wesley Publishing Company

Hindle, D and Rooth, M 1990 Structural Am- biguity and Lexical Relations In Proceedings

of the :June 1990 DARPA Speech and Natural Language Workshop Hidden Valley, Pennsyl- vania

:Jelinek, F 1985 Self-organizing Language Model- ing for Speech Recognition IBM Report

Magerman, D M and Marcus, M P 1991 Pearl:

A Probabilistic Chart Parser In Proceedings

of the February 1991 DARPA Speech and Nat- ural Language Workshop Asilomar, Califor- nia

Derouault, A., and Merialdo, B., 1985 Probabilis- tic Grammar for Phonetic to French Tran- scription ICASSP 85 Proceedings Tampa, Florida, pp 1577-1580

Sharman, R A., :Jelinek, F., and Mercer, R 1990 Generating a Grammar for Statistical Train- ing In Proceedings of the :June 1990 DARPA Speech and Natural Language Workshop Hid- den Valley, Pennsylvania

Tiêu đề	Towards history-based grammars: using richer models for probabilistic parsing
Tác giả	Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, Salim Roukos
Trường học	IBM Thomas J. Watson Research Center
Chuyên ngành	Linguistics
Thể loại	Research paper

Định dạng
Số trang	7
Dung lượng	532,78 KB