Keywords-corpus, treebank, part of speech POS tagging, word segmentation, error detection, entropy I.. For example, ”centennial year” is a variation bi-gram which 1 There are several pur
Trang 1Two Entropy-based Methods for Detecting Errors in POS-Tagged Treebank
Phuong-Thai Nguyen
University of Engineering and Technology Vietnam National University, Hanoi thainp@vnu.edu.vn
Anh-Cuong Le
University of Engineering and Technology Vietnam National University, Hanoi cuongla@vnu.edu.vn
Tu-Bao Ho
Japan Advanced Institute of Science and Technology
bao@jaist.ac.jp
Thi-Thanh-Tam Do
University of Engineering and Technology Vietnam National University, Hanoi dotam85@gmail.com
Abstract—This paper proposes two methods of employing
conditional entropy to find errors and inconsistencies in
tree-bank corpora These methods are based on two principles that
high entropy implies high possibility of error and that entropy
is reduced after error correction The first method ranks
error candidates using a scoring function based on conditional
entropy The second method uses beam search to find a subset
of error candidates in which the change of labels leads to
decreasing of conditional entropy We carried out experiments
with Vietnamese treebank corpus at two levels of annotation
including word segmentation and part-of-speech tagging Our
experiments showed that these methods detected
high-error-density subsets of original error candidate sets The size of
these subsets is only one third the size of whole sets, while these
subsets contain 80%-90% of errors in whole sets Moreover,
entropy was significantly reduced after error correction.
Keywords-corpus, treebank, part of speech (POS) tagging,
word segmentation, error detection, entropy
I INTRODUCTION
Currently, natural language processing research is
dom-inated by corpus-based approaches However, building
an-notated corpora is a costly and labor-intensive task There
are errors even in released data, as shown by the fact that
complex data such as treebanks are often released in several
versions1 In order to speed up annotation and increase the
reliability of labelled corpora, various kinds of software tools
have been built for format conversion, automatic annotation,
tree edition [9], etc In this paper we focus on methods for
checking errors and inconsistencies in annotated treebank
Three techniques to detect part-of-speech tagging errors
have been proposed by Dickinson and Meurers [2] The
main idea of their first technique is to consider variation
n-grams, the ones which occur more than once in the corpus
and include at least one difference in their annotation For
example, ”centennial year” is a variation bi-gram which
1 There are several purposes for multi-version treebank publishing: error
correction, annotation scheme modification, and data addition For example,
major changes in the Penn English Treebank (PTB) [5] upgrade from
version I to version II include: POS tagging error correction, and
predicate-argument structure labelling In PTB upgrade from version II to version III,
more data is appended.
occurs in Wall Street Journal (WSJ), a part of Penn Tree-bank corpus [5] with two possible taggings2 ”centennial/JJ year/NN” and ”centennial/NN year/NN” Among them the second tagging is correct Dickinson found that a large percentage of variation ngrams in WSJ have at least one instance (occurrence) with incorrect label However, using this variation-ngram method, linguists have to check all instances of variation ngrams to find errors The other two techniques take into account more linguistic information including tagging-guide patterns and functional words Dickinson [3] reported a method to detect ad-hoc treebank structures He used a number of linguistically-motivated heuristics to group context-free grammar (CFG) rules into equivalance classes by comparing the right hand side (RHS)
of rules An example of heuristics is that CFG rules of the same category should have the same head tag and similar modifiers, but can differ in the number of modifiers By applying these heuristics, the RHS sequences3 ADVP RB ADVP and ADVP, RB ADVP will be grouped into the same class Classes with only one rule, or rules which do not belong to any class are problematic He evaluates the proposed method to analyse several types of errors in Penn treebank [5] However, similarly to [2], this study proposed
a method to determine candidates of problematic patterns (adhoc CFG rules instead of variation ngrams) but not problematic instances of those patterns
Yates et al [10] reported a study on detecting parser errors using semantic filters First, syntactic trees, output
of a parser, are converted into an intermediate representation called relational conjunction (RC) Then, using the Web as a corpus, RCs are checked using various techniques including point-wise mutual information, verb arity sampling test, text-runner filter, and question answering (QA) filter In evalu-ation, error rate reductions of 20% and 67% were reported when tested on Penn treebank and TREC, respectively The interesting point of their paper is that information from the
2 JJ: adjective, NN: noun
3 ADVP: adverbial phrase, RB: adverb
2011 Third International Conference on Knowledge and Systems Engineering
Trang 2Web was utilized to check for errors.
Novak and Razimova [8] used Apriori, an association
rule mining algorithm, to find annotation rules, and then to
search for violations of these rules in corpora They found
that violations are often annotation errors They reported an
evaluation of this technique performed on the Prague
De-pendency Treebank 2.0, presented the error analysis which
showed that in the first 100 detected nodes, 20 of them
contained an annotation error However, this was not an
intensive evaluation
Figure 1: Conceptual sets S1: The whole treebank data; S2:
Data set of variation ngrams; S3: Error set
In this paper, in order to overcome the drawback of
previous such those of Dickinson and colleagues, we
in-troduce two learning methods based on conditional entropy
for detecting errors in treebanks Our methods, naming
ranking and beam search, can detect erroneous instances
of variation ngrams4 in treebank data (Figure 1) These
methods are based on entropy of labels, given their
con-texts Our experiments showed that conditional entropy was
reduced after error correction, and that by using ranking
and beam search, the number of checked instances can be
reduced drastically We used Vietnamese treebank [7] for
experiments The rest of our paper is organized as follows:
in Section 2 error detection methods are presented, then in
Section 3 experimental results and discussion are reported,
finally conclusions are drawn and future work is proposed
in Section 4
II ERRORDETECTIONMETHOD
A A Motivating Example
First, we consider a motivating example The following
25-gram is a complete sentence that appears 14 times,
four times with centennial tagged as JJ and ten times with
centennial marked as NN, with the latter being correct,
according to the tagging guide (Santorini, 1990)
• During its centennial year , The Wall Street Journal will
report events of the past century that stand as milestones
of American business history
4 This term has the same meaning as the term ”variation nuclei” in [2].
In our paper, variation ngram is an ngram which varies in labels because
of ambiguity or annotation error Contextual information, for example
surrounding words, is not included in an ngram.
Given Penn treebank data, and given surrounding context, two words before and twenty two words after, the distribution
of centennial’s tag over the tag setJJ, NN is (4/14, 10/14) This distribution has a positive entropy value If all instances
of centennial were tagged correctly, the distribution of its tag
would be(0, 1) and this distribution has an entropy value of zero This simple analysis suggests that there is a relation between entropy and errors in data, and that high entropy seems to be a problem
Note that labelled data are often used for training statis-tical classifiers such as word segmenters, POS taggers, and syntactic parsers Error-free or reduced-error training data will result in a better classifier Entropy is a measure of uncertainty Does an explicit mathematical relation between entropy and classification error exist?
B A Probabilistic Relation between Entropy and Classifi-cation Error
Suppose that X is a random variable representing infor-mation that we know, and Y is another random variable for which we have to guess the value The relation between
X and Y is p(y|x) From X, we calculate a classification function g(X) = ˆY We define probability of error Pe =
P (Y = ˆY ) Fano’s inequality [1] relates Pe to H(Y |X) as follows:
Pe≥ H(Y |X) − H(Pe)log(M − 1) ≥ H(Y |X) − 1log(M − 1) (1) whereM is the number of possible values of Y The inequal-ity shows an optimal lower bound on classification-error probability If H(Y |X) is small, we have more chances to estimateY with a low probability of error If H(Y |X) > 0, there can be a number of reasons:
• ambiguity: for example, the word can is ambiguous
between being an auxiliary, a main verb, or a noun, and thus there is variation in the way can would be
tagged in ”I can play the piano”, and ”Pass me a can
of beer, please”
• the choice of X (feature selection): in decision tree learning [6], H(Y ) − H(Y |X) is called information gain
• error: for example, the tagging of a word may be inconsistent across comparable occurrences
In this paper we focus on the relation between H(Y |X) and the correctness of training data We make two working assumptions:
• there is a strong correlation between high conditional entropy and errors in annotated data
• conditional entropy is reduced when errors are cor-rected
These assumptions suggest that error correction can be considered as an entropy reduction process Now we con-sider a more realistic classification configuration, using K features rather than only one Our objective is to reduce the
Trang 3conditional entropyH(Y |X1, X2, , XK) Since
condition-ing reduces entropy, it is easy to derive:
H(Y |X1, X2, , XK) ≤ K1
K
i=1 H(Y |Xi) (2)
To simplify calculations, we can try to reduce the
up-per bound K1 K
i=1H(Y |Xi) instead of directly handling H(Y |X1, X2, , XK) Later, through our experiments, we
will show that this simplification works well
The equation (2) can be straightforwardly proved Since
conditioning reduces entropy [1], we have: H(Y |X) ≤
H(Y ) This inequality implies that on the average,
the more information, greater the reduction in
uncer-tainty By applying this inequality K times, we obtain:
H(Y |X1, X2, , XK) ≤ H(Y |Xi) for 1 ≤ i ≤ K Sum
these inequalities and divide both sides byK, we have (2)
C Empirical Entropy
EntropyH(Y |X1, X2, , XK) can be computed as:
x 1 ,x 2 , ,x Kp(x1, x2, , xK) × H(Y |X1 = x1, X2 =
x2, , XK= xK)
where the sum is taken over the setA1× A2× × AK,Ai
are sets of possible values ofXi, and
H(Y |X1 = x1, X2 = x2, , XK = xK) =
y−p(y|x1, x2, , xK) × log(p(y|x1, x2, , xK)).
Using Bayes formula and making independent
assump-tions between Xi, we can decompose p(y|x1, x2, , xK)
into:
K
i=1p(xi|y) × p(y)/K
i=1p(xi) where
p(xi|y) = F req(y, xi)/F req(y), p(y) = F req(y)/L, and
p(xi) = F req(xi)/L
where L indicates the number of examples in our data set
When K is a large number, it is difficult to compute
the true value of H(Y |X1, X2, , XK), since there are
|A1| × |A2| × × |AK| possible combinations of Xi’s
values A practical approach to overcome this problem is to
compute empirical entropy on a data set More specifically,
the entropy sum will be taken over (x1, x2, , xK) which
((x1, x2, , xK), y) exists in our data set
Empirical entropy was not used for our error detection
methods It was used only for computing entropy reduction
over data sets in Section 3.6
D Error Detection by Ranking
Based on the first working assumption stated in Section
2.2, we rank training examples(x, y) = ((x1, x2, , xK), y)
in decreasing order using the following scoring function:
Score(x, y) =
K
i=1 H(Y |Xi= xi) + ΔH (3) where the first term does not depend on y, and the second termΔH is the maximal reduction of the first term when y
is changed
Suppose thatB is a set of possible values of Y , M = |B| Without loss of generality, suppose thatB = {1, 2, , M} GivenXi= xi, the discrete conditional distribution ofY is
P (Y |Xi= xi) = (p1, p2, , pM), wherepj≥ 0(1 ≤ j ≤ M) andMj=1pj= 1 Also, pj can
be computed by
pj= F req(j, xi)/F req(xi) whereF req(j, xi) is the co-occurrence frequency of j and
xi, andF req(xi) is the frequency of xiwhich can be easily calculated from a corpus The conditional entropy can be computed by
H(Y |Xi= xi) = −Mj=1pj× log(pj)
When the label ofx = (x1, x2, , xK) changes from y
to y, for each xi, P (Y |Xi = xi) changes to P (Y|Xi =
xi) = (p
1, p
2, , p
M) in which p
j = pj for j = y and
j = y, p
y = (F req(y, xi) − 1)/F req(xi), and p
y = (F req(y, xi) + 1)/F req(xi) The entropy H(Y |Xi = xi) becomesH(Y|Xi = xi) and it is simple to compute ΔH
by the formula
ΔH = maxyK
i=1[H(Y |Xi= xi) − H(Y|Xi = xi)] = maxyK
i=1[−py×log(py)−py×log(py) +p
y×log(p y)+
p
y × log(p
y )]
The idea behind the use of ΔH is that correcting an error should lead to decrease in entropy We consider the word5 nổi, occurring 75 times in Vietnamese
treebank; among these occurrences there are 6 error instances, with three possible POS tags A, R, and V The following are scores of the first ten instances:
(1) 4.92 + 1.11, (2) 4.35 + 1.55, (3) 3.60 + 1.27, (4) 4.48 − 0.21, (5)3.36 + 0.89, (6)4.18 − 0.31, (7) 2.96 +
0.86, (8) 4.23 − 0.47, (9) 3.98 − 0.30, (10)4.40 − 0.87.
The score is represented as a sum of two numbers in which the second isΔH Error instances are in bold If ΔH
is omitted from the scoring formula, the order of examples
will be: (1), (4), (10), (2), (8), (6), (9), (3), (5), (7).
E Error Detection by Using Beam Search
In the ranking method, a change in label of an example does not affect the score of other examples Based on the second working assumption stated in Section 2.2, in this
5as a verb (V): float; as an adverb (R): impossibly; as an adjective (A): famous
Trang 4section we propose a beam search method for error detection.
A subset of data in which relabelling leads to a decrease in
entropy is searched for The objective function is the upper
bound K1 K
i=1H(Y |Xi) The subset size is limited to N,
which is about tens of percent of the whole data set We used
a multi-stack beam search algorithm described as follows:
Algorithm 1 A beam-search algorithm for error detection
create the initial state, put it intostack[0]
for i = 1 to N do
for each state s in stack[i − 1] do
{expand s}
for each example e in the data set do
relabele
create a new states_new and score s_new
adds_new to stack[i]
prunestack[i]
end for
end for
end for
choose the lowest-score state from stacks
• a state (or a hypothesis) is a relabelled subset of the
data set, states with the same number of examples are
put into a stack, and stacks are numbered by the number
of examples in their states
• a state in stack[i − 1] is expanded by adding a new
relabelled example into the state’s example set resulting
in a new state, and the new state will be add tostack[i]
• given a state and a new example, the example is
relabelled by choosing the label which minimizes the
objective function
• the size of a stack is limited by O (in practice this
number is set to one hundred or several hundreds), that
means onlyO lowest-score states are kept
• the lowest-score state will be chosen as a set of error
candidates, if there are more than one such states, the
one with the smallest number of examples will be
chosen
F Application to Word-Segmented and POS-Tagged Data
Sets
In this paper, we focus on checking word-segmented and
POS-tagged corpora For word segmented data, syllable
ngrams which have multiple word segmentations will be
considered (as random variableY ) Features are two
preced-ing words and two followpreced-ing words (total of four features,
as random variablesXi) For POS tagged data, words with
multiple tags are considered Feature set includes
surround-ing words and their POS tags (total of eight features) Table
1 shows two examples including labelled sentences, variation
ngrams in italics, subscript for mapping Vietnamese-English
words, and features
S1: Nguyện_vọng1 về2 vấn_đề3 nước dùng đã4 được5
xem_xét6 E: Proposal1 for2 clean water supply3 has4 been5
considered6 Features: về2, vấn_đề3, đã4, được5
S2: Ông1/N chỉ2/R muốn3/V chui4/V xuống/E đất5/N khi6/N chủ_nợ7/N đến8/V /
E: He1 just2wanted3 to disappear45when6 creditors7
came8 /
Features: muốn3, chui4, đất5, khi6, V3, V4, N5, N6 Table 1: Features for word-segmentation and POS tagging error detection tasks S1: Word-segmented sentence S2: POS-tagged sentence E: English translation
III EXPERIMENTS
A Corpus Description
We used word-segmented and POS-tagged data sets of Vietnamese treebank [7] for experiments There are several phenomena specific to Vietnamese words The first is word segmentation Like a number of other Asian languages such as Chinese, Japanese and Thai, Vietnamese has no word delimiter The smallest unit in the construction of Vietnamese words is the syllable A Vietnamese word can
be a single word (one syllable) or a compound word (more than one syllable) A space is a syllable delimiter but not a word delimiter in Vietnamese A Vietnamese sentence can often be segmented in many ways Obviously, Vietnamese word segmentation is a non-trivial problem The second is that Vietnamese is an isolating language Functional words instead of word inflection are used to express number, tense, etc
The Vietnamese treebank was developed in a two-year na-tional project6 For each data set, there were some phases in their development, including: labelling using tools, manual revision, second manual revision, and manual revision driven
by specific linguistic phenomena Therefore each sentence was checked by at least two annotators After each phase, data sets became cleaner Of course revisions were carried out with the use of guidelines, which were also modified in the development of the corpus
We can not directly use treebank data for the evaluation of the error-checking task Dickinson and Meurers [2] manually checked all instances of variation ngrams to find erroneous instances However, we did not use Dickinson and Meurers’s method We compared different versions of data sets to find which sentences were modified and at which positions (words or phrases) Table 2 shows the description of data sets which were used in our experiments For each data set, two versions were used to extract evaluation data: one version resulting from manual revision, and the other version resulting from second manual revision
6 http://vlsp.vietlp.org:8080/demo/
Trang 5Data set Sentences Words Voc
Table 2: Vietnamese treebank’s data sets which were used
in experiments
S1 Thủ_môn1 trả giá vì2 sai_lầm3 ngớ_ngẩn4
S2 Thủ_môn1 trả_giá vì2 sai_lầm3 ngớ_ngẩn4
E The goalkeeper1 pays for2his blunder34
Table 3: Example of word-segmented sentence comparison
using MED algorithm S1: erroneous sentence S2: corrected
sentence E: English translation
B Data Extraction
Comparisons were carried out sentence by sentence using
minimum edit distance (MED), a dynamic programming
algorithm [4], in which three operations including insertion,
deletion, and replacement are used The MED algorithm is
followed by a post-processing procedure to combine
oper-ations on adjacent words of the original sentence Table 3
shows an example of word-segmented sentence comparison
using MED algorithm The underscore character is used to
connect syllables of the same word The syllable sequence
trả giá is a variation bigram The MED algorithm found
that trả (pay) was deleted and giá (price) was replaced by
trả_giá (pay) Since trả and giá were two adjacent words in
the original sentence, deletion and replacement operations
were combined together, resulting in the replacement
(mod-ification) of trả giá by trả_giá.
The extraction results on treebank’s two data sets are
reported in Table 4 Variation ngrams can be a sequence of
syllables with multiple word segmentations in corpus, or a
word with multiple tags in corpus An instance (or example)
is an occurrence of an ngram An error variation ngram is
one with at least one error instance (incorrectly labelled)
This table shows the ambiguation core of the corpus The
percentage of error variation ngram is high, however the
percent of error instances is much lower How to reduce the
number of instances to be checked is meaningful
C Error Types and Distributions
As shown in Table 4, not all instances of variation ngrams
are erroneous Figure 2 displays error distribution curves
which show the likelihood of the number of error instances
ngrams
Table 4: Data extraction statistics
of a variation ngram These curves look like Poisson distri-bution, known as distribution of rare events For the word-segmented data set, on average each variation ngram has 31.15 instances in total and 3.34 erroneous instances For the POS-tagged data set, on average each variation ngram has 64.36 instances in total and 5.18 erroneous instances Maximum points are close to vertical axis7 It is clear that most variation ngrams have zero, one, two, or several errors
Figure 2: Error distribution curves Horizontal axis repre-sents error count Vertical axis reprerepre-sents variation ngram count Red curve corresponds to word segmentation data set Blue curve corresponds to POS tagged data set
In the word segmented data set, about 60% of erro-neous instances require correction by combining single words to form a compound word About 40% require a change by splitting a compound word into single words
A number of typical corrections are listed here: subor-dinated compound (khu phố → khu_phố (quarter), kim khâu → kim_khâu (needle)), coordinated compound (thu đông → thu_đông (autumn and winter), xinh đẹp → xinh_đẹp (beautiful)), another kind of subordinated com-pound (nhà khoa_học → nhà_khoa_học (scientist), nguyên bộ_trưởng→ nguyên_bộ_trưởng (former minister)), proper noun (Công_ty_FPT → Công_ty FPT (FPT company), Hà Nội→ Hà_Nội)
Figure 3 shows the percentage of each modified POS tag For example, the first column shows that among 8,734 (Table 4) erroneous POS tagged instances, 20.87% were changed from the noun tag N to other POS tags Among 18 columns, the ones corresponding to noun, verb, adverb, and adjective have largest percentage
D Error Detection Results for Word Segmentation
Figure 4 shows error detection results for word segmenta-tion The blue curve represents the number of error examples discovered if annotators check data set in which examples are in original order The red curve represents the number
of error examples discovered if annotators check data set in
7 Two points nearest to the vertical axis are the number of variation ngrams which have no erroneous instances.
Trang 6Figure 3: The percentage of each modified POS tag.
which examples are sorted in decreasing order of entropy
It is obvious that most errors, about 89.92% (4,700/5,227)
have been detected after checking one third of the data set
The yellow curve shows the case using beam search It is
better than entropy ranking to a certain degree
Figure 4: Error detection result for word segmentation
Hor-izontal axis represents the number of examples annotators
have to check Vertical axis represents the number of error
examples
E Error Detection Results for POS Tagging
Figure 5 reports error detection results for POS tagging
If annotators check data with examples in original order,
the number of detected errors goes up linearly (blue curve)
If the data is sorted in decreasing order of entropy, the
number of detected errors goes up very fast (red curve),
about 81.34% (7,104/8,734) after checking one third of the
data set The efficiency of detection goes up faster if beam
search technique is used (yellow curve)
F Entropy Reduction
Entropy plays a central role in our detection methods,
high entropy corresponds to high possibility of error Table
Figure 5: Error detection result for POS tagging Horizontal axis represents the number of examples annotators have to check Vertical axis represents the number of error examples
5 shows that on both data sets, total empirical entropy of all variation ngrams has already been reduced after error correc-tion (EntDecTotal) Also, total entropy upper bound has also decreased (EntBDecTotal) For the word-segmented data set,
a majority of erroneous ngrams (92.90%) show less entropy after error correction, a very small number (0.97%) show
no change in entropy, and 6.13% show increasing entropy For POS-tagged data set, the percentage of increased-entropy erroneous ngrams is higher
According to our observations on specific erroneous ngrams, there are a number of reasons for the increase of entropy The first is the sparse data problem For ngrams with a small number of instances and few errors, the cor-rection of errors leads to entropy increase in some cases The second is that some words are highly ambiguous, and after revision there are still errors Within the set of 95 erroneous ngrams whose number of erroneous instances is greater than
15, there are 39 ngrams (41.05%) whose entropy increased Though this is a small set, the ratio is high in comparison with 22.71% on average
It is logical that entropy upper bound is reduced more than empirical entropy However, it seems that the difference between these values is rather large Note that empirical en-tropy is summed over a subset of the whole space, therefore
it is smaller than the true entropy value Ifp(x1, x2, , xK)
is normalized, the calculation of empirical entropy reduction will result in a higher value8
IV CONCLUSION
We have investigated two entropy-based methods for de-tecting errors and inconsistencies in treebank corpora Our experiments on Vietnamese treebank data showed that these methods are effective More specifically, these methods can
8 Using p(x 1 , x 2 , , x K ) = F req(x 1 , x 2 , , x K )/L, the value of empirical entropy reduction was 173.49 on the word-segmented data set.
Trang 7DS EntDec EntUnc EntInc EntDec EntB
Table 5: Entropy changes on data sets (DS)
Ent-Dec/EntUnc/EntInc Ngram: the percentage of erroneous
ngrams for which entropy decreased/remained/increased;
EntDec Total: total entropy reduction of ngrams; EntBDec
Total: total entropy bound reduction of ngrams
reduce by two thirds the size of error candidate sets, and
con-ditional entropy is really reduced after correction of errors
We are applying the entropy-based approach for detecting
syntax tree errors in treebank In the future, we intend to
use extra resources such as word clusters to improve error
detection results We also intend to use this approach for
checking other kinds of data
ACKNOWLEDGMENT
This work is partially supported by the TRIG project at
University of Engineering and Technology, VNU Hanoi It
is also partially supported by the Vietnam’s National
Foun-dation for Science and Technology Development
(NAFOS-TED), project code 102.99.35.09
REFERENCES
[1] Cover, Thomas M and Joy A Thomas 2006 Elements of
Information Theory John Wiley & Sons, Inc.
[2] Dickinson, Markus and W Detmar Meurers 2003 Detecting
Errors in Part-of-Speech Annotation In Proceedings of EACL.
[3] Dickinson, Markus 2008 Ad Hoc Treebank Structures In
Proceedings of ACL.
[4] Jurafsky, Daniel and James H Martin 2009 Speech and
Language Processing: An Introduction to Natural Language
Processing, Computational Linguistics and Speech Recognition
Prentice Hall
[5] Marcus, Mitchell P., Mary A Marcinkiewicz, Beatrice
San-torini 1993 Building a Large Annotated Corpus of English:
The Penn Treebank Computational Linguistics.
[6] Mitchell, Tom M 1997 Machine Learning The McGraw-Hill
Companies, Inc
[7] Phuong-Thai, Nguyen, Vu Xuan Luong, Nguyen Thi Minh
Huyen, Nguyen Van Hiep, Le Hong Phuong 2009 Building
a Large Syntactically-Annotated Corpus of Vietnamese In
Proceedings of LAW-3, ACL-IJCNLP.
[8] Novak, Vaclav and Magda Razimova 2009 Unsupervised
De-tection of Annotation Inconsistencies Using Apriori Algorithm
In Proceedings of LAW-3, ACL-IJCNLP.
Proceedings of COLING.
[10] Yates, Alexander, Stefan Schoenmackers, and Oren Etzioni
Filters In Proceedings of EMNLP.