Vietnamese Parsing with an AutomaticallyExtracted Tree-Adjoining Grammar Phuong Le Hong VNU University of Science, Hanoi, Vietnam phuonglh@vnu.edu.vn Thi Minh Huyen Nguyen VNU University
Trang 1Vietnamese Parsing with an Automatically
Extracted Tree-Adjoining Grammar
Phuong Le Hong
VNU University of Science, Hanoi, Vietnam
phuonglh@vnu.edu.vn
Thi Minh Huyen Nguyen VNU University of Science, Hanoi, Vietnam
huyenntm@vnu.edu.vn
Azim Roussanaly LORIA, Nancy, France azim.roussanaly@loria.fr
Abstract—This paper presents the construction and evaluation
of a deep syntactic parser based on Lexicalized Tree-Adjoining
Grammars for the Vietnamese language This is a complete
sys-tem integrating necessary tools to process Vietnamese text, which
permits to take as input raw texts and produce syntactic
struc-tures A dependency annotation scheme for Vietnamese and an
algorithm for extracting dependency structures from derivation
trees are also proposed At present, this is the first Vietnamese
parsing system capable of producing both constituency and
dependency analyses with encouraging performances: 69.33%
and 73.21% for constituency and dependency analysis accuracy,
respectively.
I INTRODUCTION
Syntactic parsing is a basic task in natural language
pro-cessing For Vietnamese, there have been few published works
dealing with this problem This paper presents the construction
and evaluation of a deep syntactic parser based on
Lexical-ized Tree-Adjoining Grammars (LTAG) for the Vietnamese
language
The paper is organized as follows In this first section, we
introduce the notion of constituency and dependency analysis,
as well as of the Tree-Adjoining Grammar (TAG) formalism
Section II proposes a dependency annotation scheme for
Viet-namese and an algorithm for extracting dependency relations
from derivation trees given by a TAG parsing Section III
presents the construction of a Vietnamese parser capable of
producing both constituency and dependency analyses
Sec-tion IV gives a detailed evaluaSec-tion of the parsing system We
conclude the paper with some discussions and directions for
future works
A Constituency and dependency analysis
Constituency structure and dependency structure are two
types of syntactic representation of a natural language
sen-tence While a constituency structure represents a nesting
of multi-word constituents, a dependency structure represents
dependencies between individual words of a sentence The
syntactic dependency represents the fact that the presence of
a word is licenced by another word which is its governor In
a typed dependency analysis, grammatical labels are added
to the dependencies to mark their grammatical relations, for
example subject or indirect object.
Recently, there have been many published works on
depen-dency analysis for well-studied languages, such as English [1]
or French [2] The dependency parsers developed for these
languages are usually probabilistic and trained on available
corpora of the concerned languages We can classify the architecture of those parsers into two main types:
• parsers that employ a machine learning method on de-pendency corpora extracted automatically from treebanks and directly produce dependency parses [3], [4];
• parsers that rely on a sequential process where con-tituency parses are produced first and then dependency parses are extracted [2], [5]
In the second architecture, we obviously need a module which takes as input constituency parses given by a con-stituency parser and converts these parses into typed depen-dency parses as illustrated in Figure 1 for a French sentence1
S NP
D Une
N lettre
VN V avait
V été
V envoyée
NP D la
N semaine
A dernière
PP P aux
NP N salariés envoyé
lettre
suj
Une
det
avait
aux
été
aux
semaine
mod
la
det
dernière
mod
aux
a-obj
salariés
obj
Figure 1 Constituency and dependency analysis of a French sentence
B Tree-Adjoining Grammars
In the TAG formalism [6], the grammar is defined by a set of elementary trees, divided in initial trees and auxiliary trees These trees can be combined with substitution and adjunction operations to form derived trees A TAG parsing system rewrites nodes of trees rather than symbols of strings
as in context-free grammars (CFG) Figure 2 gives a simple Vietnamese TAG and an analysis of a sentence The first half
of the figure shows the elementary trees of the grammar and the second half shows the derived tree and its corresponding derivation tree, where the notation <anchor> represents the elementary tree corresponding to a lexical anchor A derivation tree in TAG specifies how a derived tree was constructed
1 A letter was sent to the employees last week.
978-1-4673-0309-5/12/$31.00 ©2012 IEEE
Trang 2Np
Giang
S
NP↓ VP
V
cho
NP↓ NP↓
NP P tôi
NP M một NP∗
NP Nu quả
NP NP∗ N cam
Elementary trees
S
NP
Np
Giang
VP
V
cho
NP
P
tôi
NP M một
NP NP Nu quả
N cam
<cho>
<Giang> <tôi> <quả>
<một> <cam>
Derived tree Derivation tree
Figure 2 A TAG analysis of the sentence “Giang cho tôi một quả cam”
( Giang gave me an orange )
There is a number of advantages that TAG has over CFG
First, it provides an extended domain of locality Second,
the adjunction operation permits to realize discontinuous
con-stituency constructions As consequence, some TAGs
recog-nize context-sensitive languages For this reason, TAG are
called mildly context-sensitive grammars Third, TAG
deriva-tion trees show semantic dependencies between entities in
a sentence, as the tree branches represent their combination
type (dashed or continuous line for substitution or adjunction,
respectively, in Figure 2) In addition, in LTAG, lexical entries
naturally capture constraints associated with lexical items,
which is not possible in CFG
II EXTRACTION OFDEPENDENCYRELATIONS
A Dependency annotation schema
There exists many schema for dependency annotation, for
example the Stanford Dependency (SD) annotation scheme [5],
issued from an automatic conversion of the English Penn
Treebank, the PARC 700 scheme [7], inspired from functional
structures of lexical functional grammars, the GR scheme [8]
or EASy [9] for French The multiplicity of these different
annotation schema is due to different linguistic and practical
choices We prefer defining an annotation scheme of surface
dependency for the Vietnamese language which can be not
only convertible to different standards cited above but also
enlargeable to finer dependency schema if necessary The
current scheme contains 13 grammatical relations representing
principal functional dependencies between Vietnamese words
All these dependencies use the syntactic categories defined in
the Vietnamese treebank [10] and they are divided into three
groups
The first group, arg, represents the relationship between a
head word and its argument There are two types of
argu-ments: subject (subj) or object (obj) It is worth noting that
Vietnamese is a topic-prominent language where sentences are structured around topics rather than subjects and objects [11]
In many cases, we cannot identify the subject and the ob-ject of a Vietnamese sentence by their respective positions The distinction between subject and object of a Vietnamese sentence is thus not a trivial task, expecially in an automatic process Therefore, at the moment, we do not distinguish the
two relations subj and obj in our evaluations.
The second group, mod, represents modification relations of
a word and its head word (or its governor) According to the syntactic category of the modifier, we distinguish nine
mod-ification relations named modN (nominal modifier), modM (numeral modifier), modA (adjective modifier), modR (adver-bial modifier), modE (prepositional modifier), modV (verbal modifier), modL (determinant modifier), modP (pronominal modifier) and modC (subordinating coordination modifier).2
The third group, coord, represents dependencies of each
lexical head of two coordinating phrases on the conjunction Having defined a dependency annotation scheme for Viet-namese, we now propose an algorithm for automatically ex-tracting dependency analysis from TAG derivation trees
B An algorithm for dependency relation extraction
It has been shown that the TAG formalism shares many important similarities with the dependency grammar formal-ism [12] A derivation tree of TAG can easily be converted into dependency trees in the case of lexicalized grammars The main idea is to transform each derivation operation into a dependency relation A derivation operation between a source tree t1 and a target tree t2 results in a dependency relation between the head word of t1 as governor and the head of t2
as dependent word
The dependency analysis corresponding to the analysis in Figure 2 is shown in Figure 3 We see that the derivation tree can be transformed into the dependency tree by a simple transformation in which each node of the derivation tree (representing an elementary tree) is replaced with its lexical node Here, we want to extract typed dependencies where each one is labeled by a grammatical relation following the annotation scheme defined above We thus need to consider the operation done at each node of the derivation tree If it is
a substitution, a relation of type arg will be created; if it is an adjunction, a relation of type mod will be created and its label
can be determined by examining the syntactic category of the concerned word at the lexical node of the derivation tree
cho
Figure 3 Dependency tree corresponding to the analysis in Figure 2
The most difficult case is the construction of coordination relations where we must consider three related nodes and two
2 Due to space restriction, we cannot present examples for these relations.
Trang 3combination operations at the same time since an auxiliary
tree for conjunctions in TAG has a specific form having a
substitution node and a foot node, as illustrated in the following
example trees:
X
và
(and)
X∗
X
hoặc (or) Y↓
We propose an algorithm for the automatic extraction
of dependency relations from a derivation tree given by
a constituency parser The following recursive algorithm
EXTRACT-RELATIONS(N) shows the extraction procedure in
detail
Require: A derivation tree N
Ensure: a set R of dependency relations
1: wn ←LEXICAL-NODE(N);
2: tn←POS-NODE(N);
3: for K ∈ N.kids do
4: wk ←LEXICAL-NODE(K);
5: tk ←POS-NODE(K);
6: ifK.IS-SUBST() then
7: if tn = CC then
8: R ← R ∪ NEW-RELATION(coord, wn, wk);
9: else
10: R ← R ∪ NEW-RELATION(arg, wn, wk);
11: end if
12: else ifK.IS-ADJ() then
13: if tk ∈ {A, N, R, V, E, L, M, P, C} then
14: R ← R ∪ NEW-RELATION(modtk, wn, wk);
15: end if
16: if tk= CC then
17: R ← R ∪ NEW-RELATION(coord, wk, wn);
18: end if
19: end if
20: {Recursively extract relations from tree K}
21: EXTRACT-RELATIONS(K);
22: end for
23: return R;
This algorithm uses some supplementary functions as
fol-lows The function LEXICAL-NODE(N ) returns the lexical
head of a node of an input derivation tree N , while the
functionPOS-NODE(N ) returns the part-of-speech of a lexical
head The functions IS-SUBST() and IS-ADJ() are called at
each node of the derivation tree to verify whether it is about
a substitution or an adjunction Finally, the function NEW
-RELATION(type,w1, w2) creates and returns a new relation
of type type between two lexical units w1 and w2
For example, the application of this algorithm on the input
derivation tree in Figure 1 results in the following relations:
arg (cho,Giang), arg(cho,tôi), arg(cho,quả), modM(quả,một),
modN(quả,cam)
III CONSTRUCTION OF A DEEP PARSER FORVIETNAMESE
We present briefly in this section the construction of a deep syntactic parser for Vietnamese Our parser is able to produce both constituency and dependency analyses for a given sentence
A An LTAG parser for Vietnamese
We have adapted and enriched an LTAG parser called LLP [13] to construct a deep syntactic parser for Vietnamese Given a sentence, the parser outputs all possible constituency parses and their corresponding derivation trees The most important improvement we made to the parser is the refac-toring and introduction of general interfaces and modules for preprocessing tasks (sentence detection, word segmentation, POS tagging) which naturally depend on specific languages
In particular for Vietnamese, we have developed and integrated the following preprocessing modules:
• vnSentDetector– a sentence detector which segments a text into sentences [14];
• vnTokenizer– a tokenizer which segments sentences into words or lexical units [15];
• vnTagger – a part-of-speech tagger which tags each word of a sentence with its most appropriate syntactic category [16]
We have also enriched the LLP parser by adding a supple-mentary module which extracts dependency parses from con-stituency parses given by the parser This module implements the dependency analysis extraction algorithm described in the previous section
B Grammars
The grammar used in our parser is an LTAG extracted from the Vietnamese Treebank [10] containing 10, 163 sentences
(225, 085 words, i.e about 22.14 words each sentence in
average) Statistically, most of the sentences have a length between 10 and 30 words
We choose a subset of the treebank containing 8, 808 sentences of length 30 words or less as an evaluation corpus This corpus is divided into two sets: a training set (95%
of the corpus, 8, 367 sentences) and a test set (5% of the
corpus, 441 sentences) We use vnLExtractor, an automatic
LTAG extraction system developed in [17] to extract an LTAG for Vietnamese from the training set This grammar contains 35, 655 elementary trees instantiated from 1, 658 tree templates
C Software
We have developed a software named vnLTAGParser that
implements the presented parsing system All the integrated tools, grammars and the parser itself are freely available for download3
3 http://www.loria.fr/~lehong/projects.php
Trang 4IV PERFORMANCE OF THE PARSER
In this section, we present the evaluation of the parser on
the test corpus The parser performance is considered in two
versions, with or without using part-of-speech (POS) tagging
We make use of two measures: tree accuracy (or T
-accuracy) and dependency accuracy (or D accuracy).4 When
there are multiple parse trees for a sentence (which is very
often even with a quite short sentence), we choose one of the
derivation trees whose derived trees have smallest number of
nodes because these parses correspond to the most specific
tree
A Performance of the parser without POS tagging
First, the parser is evaluated without using a POS tagger
That is, the module vnTagger is not integrated into the parser.
In this setting, each word occurence of an input sentence is
tagged with all possible tags that have been assigned to it in
the training set Unknown words are tagged as common nouns
(label N)
We first evaluate the performance of the constituency
anal-ysis The results are shown in Table I
Table I
P ERFORMANCES OF THE CONSTITUENCY ANALYSIS WITHOUT OR WITH
POS TAGGING
T -accuracy No POSAll POS No POS≤ 10wordsPOS
Precision 67 98 69 15 71 28 71 60
Recall 68 40 69 52 71 39 72 30
F -measure 68 19 69 33 71 33 71 95
Complete match 13 00 16 67 17 57 20 69
Average crossing 2 66 2 39 1 80 1 69
No crossing 23 00 27 78 29 73 32 76
Less than 3 crossings 55 00 54 17 68 92 65 52
Tagging accuracy 87 72 95 25 87 34 95 43
In addition to the common precision and recall ratios, other
measures are reported to help analyze the results:
• Complete match ratio is the percentage of sentences
where recall and precision are both 100% There are
13% of the test sentences having complete match The
complete match ratio for sentences of 10 words or less is
17.57%
• The average crossing ratio is the number of constituents
crossing a test constituent divided by the number of
sentences of the test corpus
• The no crossing ratio is the percentage of sentences
which have 0 crossing brackets There are 23% of the
test sentences that do not have any crossing (29.73%
for the sentences of 10 words or less) There are 55%
(respectively 68.92%) of the test sentences which have
less than 3 crossings
• The tagging accuracy is the percentage of correct POS
tags (without punctuations) It is interesting to note that
the tagging accuracy declines slightly when shorter test
sentences are used
4 In computing these scores, un-analyzable sentences and punctuations are
not taken into account.
The performance of dependency analysis is evaluated in two versions, with or without type In the first version, two typed dependencies type1(u1, v1) and type2(u2, v2) are considered equal if three corresponding parts of these dependencies are
all equal, that is type1 ≡ type2, u1 ≡ u2, v1 ≡ v2 In the second version, we compare only two pairs of concerned words without using their dependency types The D-accuracy of the two evaluations are given in Table II
Table II
P ERFORMANCES OF THE DEPENDENCY ANALYSIS WITHOUT OR WITH POS
TAGGING
D-accuracy No POSWith typePOS No POSWithout typePOS Precision 70 83 71 81 74 02 73 21 Complete match 15 87 20 00 23 37 25 45
Table III shows a precise view on the accuracy of each dependency type
Table III
P ERFORMANCES OF DEPENDENCY ANALYSIS BY TYPE WITHOUT OR WITH
POS TAGGING
No POS POS No POS POS No POS POS
We see that the parser works perfectly on coordination structures, as they are inherently unambiguous in both the grammar and the extraction algorithm The performance on the dependencies of type argument is much better than that
of type modifier These results justify a higher ambiguity of the adjunction operation of the LTAG formalism (which is related to auxiliary trees) in comparision with the subsitution operation (which is related to initial trees)
We observe that the parser could not parse about 16.6% of the test corpus We believe that a sentence is not analysable for two possible reasons First, there is an insufficient coverage of the underlying LTAG grammar used by the parser That is, the grammar extracted from the training corpus does not contain the syntactic structure (elementary trees) of the sentence to
be parsed Secondly, our heuristic choice of tagging all the new words as a common noun may effectively introduce errors prior to the analysis, which may result in analysis failures
At present, we do not yet have precise investigation of these causes
The ambiguity and the duration of parsing are strongly dependent on the length of sentences, as shown in Figure 4 and Figure 5 It seems that the number of parses has an exponential
Trang 5250
500
750
1000
1250
1500
1750
2000
2250
2500
Figure 4 Analysis ambiguity, average and maximum, according to the length
of sentences
0
500
1000
1500
2000
2500
Figure 5 Analysis duration (in miliseconds), average and maximum,
according to the length of sentences
growth with respect to the length of the sentence.5
B Performances of the parser with POS tagging
The results reported in the previous subsection allow a first
evaluation of the grammar and the performance of the parser
Nevertheless, the condition in which the experimentation is
carried out is rather harsh since the parser has to try all
possible syntactic categories of each word of an input sentence
The experiments in this subsection are closer to real use
conditions, in that each sentence is first processed by a tagger
to remove POS-tagging ambiguity – each word is assigned an
unique tag We have thus a sole sequence of words/tags and
it is used as input to the syntactic parser The tagging is done
by the vnTagger module.
We proceed with the evaluation of this parser version in a
similar way as presented in the previous version We first give
constituency parsing results, then dependency parsing results
and finally the ambiguity and duration of the parsing
The T -accuracy of the system is shown in Table I By
integrating a POS tagger, the tagging accuracy is greatly
improved, from 87.72% to 95.25%6 This helps improve all
the scores of the system, notably the complete match ratio,
from 13.00% to 16, 67% (and that for sentences of length 10
words or less is 20.69%)
5 For some considerably long sentences, the parser could not give any result
after a fixed time-out predefined at 3 minutes.
6 Recall that the test corpus only contains sentences of 30 words or less.
0 100 200 300 400 500 600 700
Figure 6 Analysis ambiguity, average and maximum, with an integrated tagger
0 250 500 750 1000 1250 1500
Figure 7 Analysis duration, average and maximum, with an integrated tagger
The performances of dependency analysis with or without type are shown in Table II and those of particular dependency types are shown in Table III
We see that the performances of the system are improved slightly in comparison with the system without tagging How-ever, the most important gain of the parser with an integrated tagger is a strong reduction of analysis ambiguity and time, shown in Figure 6 and Figure 7 The tagger helps reduce analysis ambiguity five times in average and reduce analysis duration three times in comparison with the required time of the parser without prior tagging Nevertheless, we observe that the integration of a tagger results in a higher number
of sentences that the parser could not parse, to 40% of the test corpus This augmentation is predictable because in this version the parser uses only a syntactic category (the most probable POS) given by the tagger for each word (We note also that the precision of the tagger at sentence level is about 32% [16], that is, there is only a third of times that the tagger can give correct tags for all the words of a sentence to be parsed)
V DISCUSSION
We have seen in the previous section the evaluation of
a syntactic analysis system based on LTAG for Vietnamese The best results obtained are 73.21% (dependency accuracy) and 69.33% (F -measure of constituency accuracy) on a test corpus
It is worth noting that these are the first results on syntactic analysis of Vietnamese based on LTAG To our knowledge, up
Trang 6to now there have existed few published works on the syntactic
analysis of Vietnamese The most complete report on parser
performance is an empirical study of applying probabilistic
CFG parsing models, by Michael Collins [18] for Vietnamese,
its best result on constituency analysis is 78% on a test corpus;
there is no reported result on dependency analysis
Concerning the constituency parsing result, their parser is
slighty better than ours However, these results are not directly
comparable since the parsing models are trained and tested on
different corpus
Our first results on the syntactic parsing of Vietnamese
are rather good although they are still significantly less
than parsing results for well-studied languages like English
(whose T -accuracy is 91.10% [19] and whose D-accuracy is
92.93% [20] on the Penn Treebank) or French (T -accuracy
is 86.41% [21] and D-accuracy is 85.55% on a French
treebank [4]) However, we can improve the results by
cor-recting three main following sources of errors identified by
the experiments
The principal source of parsing errors is the selection of
parse In fact, we chose a single parse for each sentence using
a very simple method: when there are multiple parses for
a sentence, only the parse whose derivation tree containing
less number of nodes is selected Although the returned tree
corresponds to the most specific analysis, it is obvious that
this selection method is purely heuristic and fragile There
exists many cases where chosen parses are not correct ones
A better way to select the best parse for each input sentence
is a necessary and crucial condition to improve the parsing
performance In the future, we need to develop and evaluate
more efficient methods for parse selection In this perspective,
a recourse to different models of statistic classification is a
promising approach that we intend to investigate
The second source of parsing error is the POS tagging In
the experiments with a tagger integrated, we use an only (the
best) solution of vnTagger as input to the parser We have
seen that the tagger often makes errors at the sentence level
A tagging error may effectively introduce one or more parsing
errors An improvement of tagging performance is thus another
necessary condition to improve the performance of the parser
The third source of parsing errors concerns the coverage
of the grammar used in the experiments In general, the
proportion of test sentences having at least one word that the
grammar does not recognize is rather high, at about 15% In
consequence, the parser could not build the correct analysis
for these sentences A straightforward solution to this problem
is to enlarge the coverage of the LTAG grammar, which in turn
leads to an enlargement of the Vietnamese treebank However,
developing such a corpus is an expensive and labor-intensive
task In addition, this may lead to the typical problem of
a symbolic syntactic parser, that is the tradeoff between its
performance and its efficiency This is an interesting problem
by itself, which we shall investigate in future works
REFERENCES
[1] S K¨ubler, R McDonald, and J Nivre, Dependency Parsing. Morgan
& Claypool Publishers, 2009.
[2] M Candito, B Crabbé, P Denis, and F Guérin, “Analyse syntaxique du
franc¸ais : des constituants aux dépendances,” in Actes de TALN 2009,
Senlis, France, 2009.
[3] R Johansson and P Nugues, “Dependency-based syntactic–semantic
analysis with propbank and nombank,” in CoNLL 2008: Proceedings of
the Twelfth Conference on Computational Natural Language Learning Manchester, England: Coling 2008 Organizing Committee, August 2008,
pp 183–187.
[4] M Candito, B Crabbé, and P Denis, “Statistical French dependency
parsing: Treebank conversion and first results,” in Proceedings of LREC
2010, Valletta, Malta, 2010.
[5] M.-C de Marneffe, B MacCartney, and C D Manning, “Generating
typed dependency parses from phrase structure parses,” in Proceedings
of LREC 2006, Genoa, Italy, 2006.
[6] A K Joshi and Y Schabes, Handbooks of Formal Languages and
Automata Springer-Verlag, 1997, ch Tree Adjoining Grammars [7] T H King, R Crouch, S Riezler, M Dalrymple, and R M Kaplan,
“The PARC 700 dependency bank,” in Proceedings of 4th International
Workshop on Linguistically Interpreted Corpora, Budapest, Hungary, 2003.
[8] J Caroll, T Briscoe, and A Sanfilippo, “Parser evaluation: a survey and
a new proposal,” in Proceedings of LREC 1998, Granada, Spain, 1998.
[9] P Paroubek, L G Pouillot, I Robba, and A Vilnat, “EASY : Campagne
d’évaluation des analyseurs syntaxiques,” in Proceedings of TALN 2005,
Dourdan, France, 2005, pp 3–12.
[10] P T Nguyen, L V Xuan, T M H Nguyen, V H Nguyen, and
P Le-Hong, “Building a large syntactically-annotated corpus of
Viet-namese,” in Proceedings of the 3rd Linguistic Annotation Workshop,
ACL-IJCNLP, Singapore, 2009.
[11] Đạt Hữu, T D Trần, and T L Đào, Cơ sở tiếng Việt (Basis of
Vietnamese) Hà Nội, Việt Nam: NXB Giáo dục, 1998.
[12] O Rambow and A Joshi, “A formal look at dependency grammars and phrase-structure grammars, with special consideration of word-order
phenomena,” in Current Issues in Meaning-Text Theory London: Pinter,
1994.
[13] A Roussanaly, B Crabbé, and J Perrin, “Premier bilan de la
participa-tion du LORIA à la campagne d’évaluaparticipa-tion EASY,” in Proceedings of
TALN 2005, Dourdan, France, 2005.
[14] P Le-Hong and T V Ho, “A maximum entropy approach to sentence
boundary detection of Vietnamese texts,” in Proceedings of IEEE
International Conference on Research, Innovation and Vision for the Future – RIVF 2008, Vietnam, 2008.
[15] P Le-Hong, T M H Nguyen, A Roussanaly, and T V Ho, “A hybrid
approach to word segmentation of Vietnamese texts,” in Proceedings of
the 2nd International Conference on Language and Automata Theory and Applications, M.-V Carlos, Ed Tarragona, Spain: Springer, LNCS
5196, 2008.
[16] P Le-Hong, “An empirical study of maximum entropy approach for
part-of-speech tagging of Vietnamese texts,” in Proceedings of Traitement
Automatique des Langues Naturelles (TALN-2010), Montreal, Canada, 2010.
[17] P Le-Hong, T M H Nguyen, P T Nguyen, and A Roussanaly,
“Automated extraction of tree adjoining grammars from a treebank for
Vietnamese,” in Proceedings of The Tenth International Workshop on
Tree Adjoining Grammars and Related Formalisms (TAG+10), Yale University, New Haven, CT, USA, 2010.
[18] M Collins, “Head-driven statistical models for natural language
pars-ing,” Computational Linguistics, vol 29, no 4, pp 589–637, 2003.
[19] X Carreras, M Collins, and T Koo, “TAG, dynamic programming,
and the perceptron for efficient, feature-rich parsing,” in Proceedings of
COLING 2008, Manchester, 2008.
[20] T Koo and M Collins, “Efficient third-order dependency parsers,” in
Proceedings of the 48th Annual Meeting of the Association for Compu-tational Linguistics Uppsala, Sweden: Association for Computational Linguistics, July 2010, pp 1–11.
[21] M Candito, B Crabbé, and D Seddah, “On statistical parsing of
French with supervised and semi-supervised strategies,” in Proceedings
of the EACL 2009 Workshop on Computational Linguistic Aspects
of Grammatical Inference Morristown, NJ, USA: Association for Computational Linguistics, 2009, pp 49–57.