1. Trang chủ
  2. » Thể loại khác

DSpace at VNU: Vietnamese parsing with an automatically extracted tree-adjoining grammar

6 89 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 6
Dung lượng 130,05 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Vietnamese Parsing with an AutomaticallyExtracted Tree-Adjoining Grammar Phuong Le Hong VNU University of Science, Hanoi, Vietnam phuonglh@vnu.edu.vn Thi Minh Huyen Nguyen VNU University

Trang 1

Vietnamese Parsing with an Automatically

Extracted Tree-Adjoining Grammar

Phuong Le Hong

VNU University of Science, Hanoi, Vietnam

phuonglh@vnu.edu.vn

Thi Minh Huyen Nguyen VNU University of Science, Hanoi, Vietnam

huyenntm@vnu.edu.vn

Azim Roussanaly LORIA, Nancy, France azim.roussanaly@loria.fr

Abstract—This paper presents the construction and evaluation

of a deep syntactic parser based on Lexicalized Tree-Adjoining

Grammars for the Vietnamese language This is a complete

sys-tem integrating necessary tools to process Vietnamese text, which

permits to take as input raw texts and produce syntactic

struc-tures A dependency annotation scheme for Vietnamese and an

algorithm for extracting dependency structures from derivation

trees are also proposed At present, this is the first Vietnamese

parsing system capable of producing both constituency and

dependency analyses with encouraging performances: 69.33%

and 73.21% for constituency and dependency analysis accuracy,

respectively.

I INTRODUCTION

Syntactic parsing is a basic task in natural language

pro-cessing For Vietnamese, there have been few published works

dealing with this problem This paper presents the construction

and evaluation of a deep syntactic parser based on

Lexical-ized Tree-Adjoining Grammars (LTAG) for the Vietnamese

language

The paper is organized as follows In this first section, we

introduce the notion of constituency and dependency analysis,

as well as of the Tree-Adjoining Grammar (TAG) formalism

Section II proposes a dependency annotation scheme for

Viet-namese and an algorithm for extracting dependency relations

from derivation trees given by a TAG parsing Section III

presents the construction of a Vietnamese parser capable of

producing both constituency and dependency analyses

Sec-tion IV gives a detailed evaluaSec-tion of the parsing system We

conclude the paper with some discussions and directions for

future works

A Constituency and dependency analysis

Constituency structure and dependency structure are two

types of syntactic representation of a natural language

sen-tence While a constituency structure represents a nesting

of multi-word constituents, a dependency structure represents

dependencies between individual words of a sentence The

syntactic dependency represents the fact that the presence of

a word is licenced by another word which is its governor In

a typed dependency analysis, grammatical labels are added

to the dependencies to mark their grammatical relations, for

example subject or indirect object.

Recently, there have been many published works on

depen-dency analysis for well-studied languages, such as English [1]

or French [2] The dependency parsers developed for these

languages are usually probabilistic and trained on available

corpora of the concerned languages We can classify the architecture of those parsers into two main types:

• parsers that employ a machine learning method on de-pendency corpora extracted automatically from treebanks and directly produce dependency parses [3], [4];

• parsers that rely on a sequential process where con-tituency parses are produced first and then dependency parses are extracted [2], [5]

In the second architecture, we obviously need a module which takes as input constituency parses given by a con-stituency parser and converts these parses into typed depen-dency parses as illustrated in Figure 1 for a French sentence1

S NP

D Une

N lettre

VN V avait

V été

V envoyée

NP D la

N semaine

A dernière

PP P aux

NP N salariés envoyé

lettre

suj

Une

det

avait

aux

été

aux

semaine

mod

la

det

dernière

mod

aux

a-obj

salariés

obj

Figure 1 Constituency and dependency analysis of a French sentence

B Tree-Adjoining Grammars

In the TAG formalism [6], the grammar is defined by a set of elementary trees, divided in initial trees and auxiliary trees These trees can be combined with substitution and adjunction operations to form derived trees A TAG parsing system rewrites nodes of trees rather than symbols of strings

as in context-free grammars (CFG) Figure 2 gives a simple Vietnamese TAG and an analysis of a sentence The first half

of the figure shows the elementary trees of the grammar and the second half shows the derived tree and its corresponding derivation tree, where the notation <anchor> represents the elementary tree corresponding to a lexical anchor A derivation tree in TAG specifies how a derived tree was constructed

1 A letter was sent to the employees last week.

978-1-4673-0309-5/12/$31.00 ©2012 IEEE

Trang 2

Np

Giang

S

NP↓ VP

V

cho

NP↓ NP↓

NP P tôi

NP M một NP∗

NP Nu quả

NP NP∗ N cam

Elementary trees

S

NP

Np

Giang

VP

V

cho

NP

P

tôi

NP M một

NP NP Nu quả

N cam

<cho>

<Giang> <tôi> <quả>

<một> <cam>

Derived tree Derivation tree

Figure 2 A TAG analysis of the sentence “Giang cho tôi một quả cam”

( Giang gave me an orange )

There is a number of advantages that TAG has over CFG

First, it provides an extended domain of locality Second,

the adjunction operation permits to realize discontinuous

con-stituency constructions As consequence, some TAGs

recog-nize context-sensitive languages For this reason, TAG are

called mildly context-sensitive grammars Third, TAG

deriva-tion trees show semantic dependencies between entities in

a sentence, as the tree branches represent their combination

type (dashed or continuous line for substitution or adjunction,

respectively, in Figure 2) In addition, in LTAG, lexical entries

naturally capture constraints associated with lexical items,

which is not possible in CFG

II EXTRACTION OFDEPENDENCYRELATIONS

A Dependency annotation schema

There exists many schema for dependency annotation, for

example the Stanford Dependency (SD) annotation scheme [5],

issued from an automatic conversion of the English Penn

Treebank, the PARC 700 scheme [7], inspired from functional

structures of lexical functional grammars, the GR scheme [8]

or EASy [9] for French The multiplicity of these different

annotation schema is due to different linguistic and practical

choices We prefer defining an annotation scheme of surface

dependency for the Vietnamese language which can be not

only convertible to different standards cited above but also

enlargeable to finer dependency schema if necessary The

current scheme contains 13 grammatical relations representing

principal functional dependencies between Vietnamese words

All these dependencies use the syntactic categories defined in

the Vietnamese treebank [10] and they are divided into three

groups

The first group, arg, represents the relationship between a

head word and its argument There are two types of

argu-ments: subject (subj) or object (obj) It is worth noting that

Vietnamese is a topic-prominent language where sentences are structured around topics rather than subjects and objects [11]

In many cases, we cannot identify the subject and the ob-ject of a Vietnamese sentence by their respective positions The distinction between subject and object of a Vietnamese sentence is thus not a trivial task, expecially in an automatic process Therefore, at the moment, we do not distinguish the

two relations subj and obj in our evaluations.

The second group, mod, represents modification relations of

a word and its head word (or its governor) According to the syntactic category of the modifier, we distinguish nine

mod-ification relations named modN (nominal modifier), modM (numeral modifier), modA (adjective modifier), modR (adver-bial modifier), modE (prepositional modifier), modV (verbal modifier), modL (determinant modifier), modP (pronominal modifier) and modC (subordinating coordination modifier).2

The third group, coord, represents dependencies of each

lexical head of two coordinating phrases on the conjunction Having defined a dependency annotation scheme for Viet-namese, we now propose an algorithm for automatically ex-tracting dependency analysis from TAG derivation trees

B An algorithm for dependency relation extraction

It has been shown that the TAG formalism shares many important similarities with the dependency grammar formal-ism [12] A derivation tree of TAG can easily be converted into dependency trees in the case of lexicalized grammars The main idea is to transform each derivation operation into a dependency relation A derivation operation between a source tree t1 and a target tree t2 results in a dependency relation between the head word of t1 as governor and the head of t2

as dependent word

The dependency analysis corresponding to the analysis in Figure 2 is shown in Figure 3 We see that the derivation tree can be transformed into the dependency tree by a simple transformation in which each node of the derivation tree (representing an elementary tree) is replaced with its lexical node Here, we want to extract typed dependencies where each one is labeled by a grammatical relation following the annotation scheme defined above We thus need to consider the operation done at each node of the derivation tree If it is

a substitution, a relation of type arg will be created; if it is an adjunction, a relation of type mod will be created and its label

can be determined by examining the syntactic category of the concerned word at the lexical node of the derivation tree

cho

Figure 3 Dependency tree corresponding to the analysis in Figure 2

The most difficult case is the construction of coordination relations where we must consider three related nodes and two

2 Due to space restriction, we cannot present examples for these relations.

Trang 3

combination operations at the same time since an auxiliary

tree for conjunctions in TAG has a specific form having a

substitution node and a foot node, as illustrated in the following

example trees:

X

(and)

X∗

X

hoặc (or) Y↓

We propose an algorithm for the automatic extraction

of dependency relations from a derivation tree given by

a constituency parser The following recursive algorithm

EXTRACT-RELATIONS(N) shows the extraction procedure in

detail

Require: A derivation tree N

Ensure: a set R of dependency relations

1: wn ←LEXICAL-NODE(N);

2: tn←POS-NODE(N);

3: for K ∈ N.kids do

4: wk ←LEXICAL-NODE(K);

5: tk ←POS-NODE(K);

6: ifK.IS-SUBST() then

7: if tn = CC then

8: R ← R ∪ NEW-RELATION(coord, wn, wk);

9: else

10: R ← R ∪ NEW-RELATION(arg, wn, wk);

11: end if

12: else ifK.IS-ADJ() then

13: if tk ∈ {A, N, R, V, E, L, M, P, C} then

14: R ← R ∪ NEW-RELATION(modtk, wn, wk);

15: end if

16: if tk= CC then

17: R ← R ∪ NEW-RELATION(coord, wk, wn);

18: end if

19: end if

20: {Recursively extract relations from tree K}

21: EXTRACT-RELATIONS(K);

22: end for

23: return R;

This algorithm uses some supplementary functions as

fol-lows The function LEXICAL-NODE(N ) returns the lexical

head of a node of an input derivation tree N , while the

functionPOS-NODE(N ) returns the part-of-speech of a lexical

head The functions IS-SUBST() and IS-ADJ() are called at

each node of the derivation tree to verify whether it is about

a substitution or an adjunction Finally, the function NEW

-RELATION(type,w1, w2) creates and returns a new relation

of type type between two lexical units w1 and w2

For example, the application of this algorithm on the input

derivation tree in Figure 1 results in the following relations:

arg (cho,Giang), arg(cho,tôi), arg(cho,quả), modM(quả,một),

modN(quả,cam)

III CONSTRUCTION OF A DEEP PARSER FORVIETNAMESE

We present briefly in this section the construction of a deep syntactic parser for Vietnamese Our parser is able to produce both constituency and dependency analyses for a given sentence

A An LTAG parser for Vietnamese

We have adapted and enriched an LTAG parser called LLP [13] to construct a deep syntactic parser for Vietnamese Given a sentence, the parser outputs all possible constituency parses and their corresponding derivation trees The most important improvement we made to the parser is the refac-toring and introduction of general interfaces and modules for preprocessing tasks (sentence detection, word segmentation, POS tagging) which naturally depend on specific languages

In particular for Vietnamese, we have developed and integrated the following preprocessing modules:

vnSentDetector– a sentence detector which segments a text into sentences [14];

vnTokenizer– a tokenizer which segments sentences into words or lexical units [15];

vnTagger – a part-of-speech tagger which tags each word of a sentence with its most appropriate syntactic category [16]

We have also enriched the LLP parser by adding a supple-mentary module which extracts dependency parses from con-stituency parses given by the parser This module implements the dependency analysis extraction algorithm described in the previous section

B Grammars

The grammar used in our parser is an LTAG extracted from the Vietnamese Treebank [10] containing 10, 163 sentences

(225, 085 words, i.e about 22.14 words each sentence in

average) Statistically, most of the sentences have a length between 10 and 30 words

We choose a subset of the treebank containing 8, 808 sentences of length 30 words or less as an evaluation corpus This corpus is divided into two sets: a training set (95%

of the corpus, 8, 367 sentences) and a test set (5% of the

corpus, 441 sentences) We use vnLExtractor, an automatic

LTAG extraction system developed in [17] to extract an LTAG for Vietnamese from the training set This grammar contains 35, 655 elementary trees instantiated from 1, 658 tree templates

C Software

We have developed a software named vnLTAGParser that

implements the presented parsing system All the integrated tools, grammars and the parser itself are freely available for download3

3 http://www.loria.fr/~lehong/projects.php

Trang 4

IV PERFORMANCE OF THE PARSER

In this section, we present the evaluation of the parser on

the test corpus The parser performance is considered in two

versions, with or without using part-of-speech (POS) tagging

We make use of two measures: tree accuracy (or T

-accuracy) and dependency accuracy (or D accuracy).4 When

there are multiple parse trees for a sentence (which is very

often even with a quite short sentence), we choose one of the

derivation trees whose derived trees have smallest number of

nodes because these parses correspond to the most specific

tree

A Performance of the parser without POS tagging

First, the parser is evaluated without using a POS tagger

That is, the module vnTagger is not integrated into the parser.

In this setting, each word occurence of an input sentence is

tagged with all possible tags that have been assigned to it in

the training set Unknown words are tagged as common nouns

(label N)

We first evaluate the performance of the constituency

anal-ysis The results are shown in Table I

Table I

P ERFORMANCES OF THE CONSTITUENCY ANALYSIS WITHOUT OR WITH

POS TAGGING

T -accuracy No POSAll POS No POS≤ 10wordsPOS

Precision 67 98 69 15 71 28 71 60

Recall 68 40 69 52 71 39 72 30

F -measure 68 19 69 33 71 33 71 95

Complete match 13 00 16 67 17 57 20 69

Average crossing 2 66 2 39 1 80 1 69

No crossing 23 00 27 78 29 73 32 76

Less than 3 crossings 55 00 54 17 68 92 65 52

Tagging accuracy 87 72 95 25 87 34 95 43

In addition to the common precision and recall ratios, other

measures are reported to help analyze the results:

• Complete match ratio is the percentage of sentences

where recall and precision are both 100% There are

13% of the test sentences having complete match The

complete match ratio for sentences of 10 words or less is

17.57%

• The average crossing ratio is the number of constituents

crossing a test constituent divided by the number of

sentences of the test corpus

• The no crossing ratio is the percentage of sentences

which have 0 crossing brackets There are 23% of the

test sentences that do not have any crossing (29.73%

for the sentences of 10 words or less) There are 55%

(respectively 68.92%) of the test sentences which have

less than 3 crossings

• The tagging accuracy is the percentage of correct POS

tags (without punctuations) It is interesting to note that

the tagging accuracy declines slightly when shorter test

sentences are used

4 In computing these scores, un-analyzable sentences and punctuations are

not taken into account.

The performance of dependency analysis is evaluated in two versions, with or without type In the first version, two typed dependencies type1(u1, v1) and type2(u2, v2) are considered equal if three corresponding parts of these dependencies are

all equal, that is type1 ≡ type2, u1 ≡ u2, v1 ≡ v2 In the second version, we compare only two pairs of concerned words without using their dependency types The D-accuracy of the two evaluations are given in Table II

Table II

P ERFORMANCES OF THE DEPENDENCY ANALYSIS WITHOUT OR WITH POS

TAGGING

D-accuracy No POSWith typePOS No POSWithout typePOS Precision 70 83 71 81 74 02 73 21 Complete match 15 87 20 00 23 37 25 45

Table III shows a precise view on the accuracy of each dependency type

Table III

P ERFORMANCES OF DEPENDENCY ANALYSIS BY TYPE WITHOUT OR WITH

POS TAGGING

No POS POS No POS POS No POS POS

We see that the parser works perfectly on coordination structures, as they are inherently unambiguous in both the grammar and the extraction algorithm The performance on the dependencies of type argument is much better than that

of type modifier These results justify a higher ambiguity of the adjunction operation of the LTAG formalism (which is related to auxiliary trees) in comparision with the subsitution operation (which is related to initial trees)

We observe that the parser could not parse about 16.6% of the test corpus We believe that a sentence is not analysable for two possible reasons First, there is an insufficient coverage of the underlying LTAG grammar used by the parser That is, the grammar extracted from the training corpus does not contain the syntactic structure (elementary trees) of the sentence to

be parsed Secondly, our heuristic choice of tagging all the new words as a common noun may effectively introduce errors prior to the analysis, which may result in analysis failures

At present, we do not yet have precise investigation of these causes

The ambiguity and the duration of parsing are strongly dependent on the length of sentences, as shown in Figure 4 and Figure 5 It seems that the number of parses has an exponential

Trang 5

250

500

750

1000

1250

1500

1750

2000

2250

2500

Figure 4 Analysis ambiguity, average and maximum, according to the length

of sentences

0

500

1000

1500

2000

2500

Figure 5 Analysis duration (in miliseconds), average and maximum,

according to the length of sentences

growth with respect to the length of the sentence.5

B Performances of the parser with POS tagging

The results reported in the previous subsection allow a first

evaluation of the grammar and the performance of the parser

Nevertheless, the condition in which the experimentation is

carried out is rather harsh since the parser has to try all

possible syntactic categories of each word of an input sentence

The experiments in this subsection are closer to real use

conditions, in that each sentence is first processed by a tagger

to remove POS-tagging ambiguity – each word is assigned an

unique tag We have thus a sole sequence of words/tags and

it is used as input to the syntactic parser The tagging is done

by the vnTagger module.

We proceed with the evaluation of this parser version in a

similar way as presented in the previous version We first give

constituency parsing results, then dependency parsing results

and finally the ambiguity and duration of the parsing

The T -accuracy of the system is shown in Table I By

integrating a POS tagger, the tagging accuracy is greatly

improved, from 87.72% to 95.25%6 This helps improve all

the scores of the system, notably the complete match ratio,

from 13.00% to 16, 67% (and that for sentences of length 10

words or less is 20.69%)

5 For some considerably long sentences, the parser could not give any result

after a fixed time-out predefined at 3 minutes.

6 Recall that the test corpus only contains sentences of 30 words or less.

0 100 200 300 400 500 600 700

Figure 6 Analysis ambiguity, average and maximum, with an integrated tagger

0 250 500 750 1000 1250 1500

Figure 7 Analysis duration, average and maximum, with an integrated tagger

The performances of dependency analysis with or without type are shown in Table II and those of particular dependency types are shown in Table III

We see that the performances of the system are improved slightly in comparison with the system without tagging How-ever, the most important gain of the parser with an integrated tagger is a strong reduction of analysis ambiguity and time, shown in Figure 6 and Figure 7 The tagger helps reduce analysis ambiguity five times in average and reduce analysis duration three times in comparison with the required time of the parser without prior tagging Nevertheless, we observe that the integration of a tagger results in a higher number

of sentences that the parser could not parse, to 40% of the test corpus This augmentation is predictable because in this version the parser uses only a syntactic category (the most probable POS) given by the tagger for each word (We note also that the precision of the tagger at sentence level is about 32% [16], that is, there is only a third of times that the tagger can give correct tags for all the words of a sentence to be parsed)

V DISCUSSION

We have seen in the previous section the evaluation of

a syntactic analysis system based on LTAG for Vietnamese The best results obtained are 73.21% (dependency accuracy) and 69.33% (F -measure of constituency accuracy) on a test corpus

It is worth noting that these are the first results on syntactic analysis of Vietnamese based on LTAG To our knowledge, up

Trang 6

to now there have existed few published works on the syntactic

analysis of Vietnamese The most complete report on parser

performance is an empirical study of applying probabilistic

CFG parsing models, by Michael Collins [18] for Vietnamese,

its best result on constituency analysis is 78% on a test corpus;

there is no reported result on dependency analysis

Concerning the constituency parsing result, their parser is

slighty better than ours However, these results are not directly

comparable since the parsing models are trained and tested on

different corpus

Our first results on the syntactic parsing of Vietnamese

are rather good although they are still significantly less

than parsing results for well-studied languages like English

(whose T -accuracy is 91.10% [19] and whose D-accuracy is

92.93% [20] on the Penn Treebank) or French (T -accuracy

is 86.41% [21] and D-accuracy is 85.55% on a French

treebank [4]) However, we can improve the results by

cor-recting three main following sources of errors identified by

the experiments

The principal source of parsing errors is the selection of

parse In fact, we chose a single parse for each sentence using

a very simple method: when there are multiple parses for

a sentence, only the parse whose derivation tree containing

less number of nodes is selected Although the returned tree

corresponds to the most specific analysis, it is obvious that

this selection method is purely heuristic and fragile There

exists many cases where chosen parses are not correct ones

A better way to select the best parse for each input sentence

is a necessary and crucial condition to improve the parsing

performance In the future, we need to develop and evaluate

more efficient methods for parse selection In this perspective,

a recourse to different models of statistic classification is a

promising approach that we intend to investigate

The second source of parsing error is the POS tagging In

the experiments with a tagger integrated, we use an only (the

best) solution of vnTagger as input to the parser We have

seen that the tagger often makes errors at the sentence level

A tagging error may effectively introduce one or more parsing

errors An improvement of tagging performance is thus another

necessary condition to improve the performance of the parser

The third source of parsing errors concerns the coverage

of the grammar used in the experiments In general, the

proportion of test sentences having at least one word that the

grammar does not recognize is rather high, at about 15% In

consequence, the parser could not build the correct analysis

for these sentences A straightforward solution to this problem

is to enlarge the coverage of the LTAG grammar, which in turn

leads to an enlargement of the Vietnamese treebank However,

developing such a corpus is an expensive and labor-intensive

task In addition, this may lead to the typical problem of

a symbolic syntactic parser, that is the tradeoff between its

performance and its efficiency This is an interesting problem

by itself, which we shall investigate in future works

REFERENCES

[1] S K¨ubler, R McDonald, and J Nivre, Dependency Parsing. Morgan

& Claypool Publishers, 2009.

[2] M Candito, B Crabbé, P Denis, and F Guérin, “Analyse syntaxique du

franc¸ais : des constituants aux dépendances,” in Actes de TALN 2009,

Senlis, France, 2009.

[3] R Johansson and P Nugues, “Dependency-based syntactic–semantic

analysis with propbank and nombank,” in CoNLL 2008: Proceedings of

the Twelfth Conference on Computational Natural Language Learning Manchester, England: Coling 2008 Organizing Committee, August 2008,

pp 183–187.

[4] M Candito, B Crabbé, and P Denis, “Statistical French dependency

parsing: Treebank conversion and first results,” in Proceedings of LREC

2010, Valletta, Malta, 2010.

[5] M.-C de Marneffe, B MacCartney, and C D Manning, “Generating

typed dependency parses from phrase structure parses,” in Proceedings

of LREC 2006, Genoa, Italy, 2006.

[6] A K Joshi and Y Schabes, Handbooks of Formal Languages and

Automata Springer-Verlag, 1997, ch Tree Adjoining Grammars [7] T H King, R Crouch, S Riezler, M Dalrymple, and R M Kaplan,

“The PARC 700 dependency bank,” in Proceedings of 4th International

Workshop on Linguistically Interpreted Corpora, Budapest, Hungary, 2003.

[8] J Caroll, T Briscoe, and A Sanfilippo, “Parser evaluation: a survey and

a new proposal,” in Proceedings of LREC 1998, Granada, Spain, 1998.

[9] P Paroubek, L G Pouillot, I Robba, and A Vilnat, “EASY : Campagne

d’évaluation des analyseurs syntaxiques,” in Proceedings of TALN 2005,

Dourdan, France, 2005, pp 3–12.

[10] P T Nguyen, L V Xuan, T M H Nguyen, V H Nguyen, and

P Le-Hong, “Building a large syntactically-annotated corpus of

Viet-namese,” in Proceedings of the 3rd Linguistic Annotation Workshop,

ACL-IJCNLP, Singapore, 2009.

[11] Đạt Hữu, T D Trần, and T L Đào, Cơ sở tiếng Việt (Basis of

Vietnamese) Hà Nội, Việt Nam: NXB Giáo dục, 1998.

[12] O Rambow and A Joshi, “A formal look at dependency grammars and phrase-structure grammars, with special consideration of word-order

phenomena,” in Current Issues in Meaning-Text Theory London: Pinter,

1994.

[13] A Roussanaly, B Crabbé, and J Perrin, “Premier bilan de la

participa-tion du LORIA à la campagne d’évaluaparticipa-tion EASY,” in Proceedings of

TALN 2005, Dourdan, France, 2005.

[14] P Le-Hong and T V Ho, “A maximum entropy approach to sentence

boundary detection of Vietnamese texts,” in Proceedings of IEEE

International Conference on Research, Innovation and Vision for the Future – RIVF 2008, Vietnam, 2008.

[15] P Le-Hong, T M H Nguyen, A Roussanaly, and T V Ho, “A hybrid

approach to word segmentation of Vietnamese texts,” in Proceedings of

the 2nd International Conference on Language and Automata Theory and Applications, M.-V Carlos, Ed Tarragona, Spain: Springer, LNCS

5196, 2008.

[16] P Le-Hong, “An empirical study of maximum entropy approach for

part-of-speech tagging of Vietnamese texts,” in Proceedings of Traitement

Automatique des Langues Naturelles (TALN-2010), Montreal, Canada, 2010.

[17] P Le-Hong, T M H Nguyen, P T Nguyen, and A Roussanaly,

“Automated extraction of tree adjoining grammars from a treebank for

Vietnamese,” in Proceedings of The Tenth International Workshop on

Tree Adjoining Grammars and Related Formalisms (TAG+10), Yale University, New Haven, CT, USA, 2010.

[18] M Collins, “Head-driven statistical models for natural language

pars-ing,” Computational Linguistics, vol 29, no 4, pp 589–637, 2003.

[19] X Carreras, M Collins, and T Koo, “TAG, dynamic programming,

and the perceptron for efficient, feature-rich parsing,” in Proceedings of

COLING 2008, Manchester, 2008.

[20] T Koo and M Collins, “Efficient third-order dependency parsers,” in

Proceedings of the 48th Annual Meeting of the Association for Compu-tational Linguistics Uppsala, Sweden: Association for Computational Linguistics, July 2010, pp 1–11.

[21] M Candito, B Crabbé, and D Seddah, “On statistical parsing of

French with supervised and semi-supervised strategies,” in Proceedings

of the EACL 2009 Workshop on Computational Linguistic Aspects

of Grammatical Inference Morristown, NJ, USA: Association for Computational Linguistics, 2009, pp 49–57.

Ngày đăng: 16/12/2017, 17:53

TỪ KHÓA LIÊN QUAN