Motivated by that, we present the first manually annotated Vietnamese dataset in the real estate domain.. Detecting entities is also helpful for build-ing downstream information extracti
Trang 1Named Entity Recognition for Vietnamese Real
Estate Advertisements
Son Huynh, Khiem Le, Nhi Dang, Bao Le, Dang Huynh, Binh T Nguyen
University of Science, Ho Chi Minh City, Vietnam Vietnam National University, Ho Chi Minh City, Vietnam
Trung T Nguyen, Nhi Y T Ho
Hung Thinh Corp.
Ho Chi Minh City, Vietnam
Abstract—With the booming development of the Internet and
e-Commerce, advertising has appeared in almost all areas of
life, especially in the real estate domain Understanding these
advertising posts is necessary to capture the status of real
estate transactions and rent and sale prices in different areas
with various properties Motivated by that, we present the
first manually annotated Vietnamese dataset in the real estate
domain Remarkably, our dataset is annotated for the named
entity recognition task with lots of entity types In comparison
to other Vietnamese NER datasets, our dataset contains the
largest number of entities We empirically investigate a strong
baseline on our dataset using the API supported by the spaCy
library, which comprises four main components: tokenization,
embedding, encoding, and parsing For the encoding, we conduct
experiments with various encoders, including Convolutions with
Maxout activation (MaxoutWindowEncoder), Convolutions with
Mish activation (MishWindowEncoder), and bidirectional Long
short-term memory (BiLSTMEncoder) The experimental results
show that the MishWindowEncoder gives the best performance
in terms of micro F1-score (90.72 %) Finally, we aim to publish
our dataset later to contribute to the current research community
related to named entity recognition.
Keywords—Named Entity Recognition, embedding,
Convolu-tion, Skip ConnecConvolu-tion, LSTM
I INTRODUCTION
Named Entity Recognition (NER) - also called Entity
Identification or Entity Extraction, has become an essential
and fundamental task in Natural Language Processing (NLP),
which involves identifying named entities in a text and
clas-sifying them into predefined categories A named entity is a
real-life object with proper identification and can be denoted
with an appropriate name Named entities can be a place,
person, organization, time, object, or geographic entity NER
has been investigated for many years [1] however, the majority
of existing research relies on a reasonably large annotated
dataset, which is mainly available in popular languages such
as English, French This is a bottleneck for low-resource
languages like Vietnamese, so it is worth creating a novel
manually annotated NER dataset for Vietnamese to accelerate
Vietnamese NER research
Nowadays, real estate news or advertisement sources are
massive and daily posted on many different real estate
web-sites Extracting key entities in these data sources leads to
understanding the status of real estate transactions and the
Corresponding author: Binh T Nguyen (VNU-HCM University of Science,
Ho Chi Minh City, Vietnam) (Email: ngtbinh@hcmus.edu.vn).
customer’s demand Detecting entities is also helpful for build-ing downstream information extraction, text summarization, or chatbot systems in the real estate domain Moreover, this helps store data more efficiently, facilitate data analysis, or build dashboards for data visualization
To summarize, our contributions are summarized as follows: 1) We introduce and provide the community the first man-ually annotated Vietnamese dataset in the real estate domain for the NER task Our dataset is annotated with
16 different named entity types, larger than 3 of
VLSP-2018 and 10 of PhoNER-COVID-19 Also, our dataset has the largest number of entities, consisting of over 53,000 entities
2) We conduct experiments using strong baselines with support of the spaCy library and empirically investigate three different encoders, including MaxoutWindowEn-coder, MishWindowEnMaxoutWindowEn-coder, and BiLSTMEncoder The experimental results show that MishWindowEncoder has the best performance in all Recall, Precision, and F1-Score
II RELATEDWORK
Compared to other languages, data resources for the Viet-namese NLP task are limited, specifically for the NER task To the best of our knowledge, there are only two public datasets for the Vietnamese NER task The first one is the VLSP-2018 NER dataset [2], which is an extension of the VLSP-2016 NER dataset with more data This dataset recognizes generic entities of person names, organizations, and locations in daily news articles The second one is the recently
PhoNER-COVID-19 [3] with the COVID-PhoNER-COVID-19 specified domain, which helps facilitate many types of research and downstream applications such as building question-answering systems for pandemic prevention tasks In this work, we develop and release the first Vietnamese NER dataset in the real estate domain
Existing research on the Vietnamese NER approach is also pretty limited; some techniques have been proposed using various learning models such as classifier voting [4], CRF [5] Vu et al [6] propose a method that normalizes a tweet before taking as an input of a learning model for NER in Vietnamese tweets Thang et al [7] combined BiLSTM and CRF Moreover, they enhanced word embeddings with infor-mation from characters to archive comparative results on the VLSP-2018 NER dataset Quang et al [8] proposed an online
Trang 2learning algorithm, i.e., MIRA, in combination with CRF and
bootstrapping Recently, Thinh et al [3] investigated
BiLSTM-CNN-CRF and the pre-trained language models XLM-R and
PhoBERT as well as the effect of Automatic Vietnamese word
segmentation on the Vietnamese NER task The most relevant
work to ours is proposed by Lien et al [9], which built
an Information Extraction system for Vietnamese online real
estate advertisements, but they use the rule-based approach
III OURDATASET
This section presents how to crawl data from many sources
on the internet, preprocess and annotate the raw data One can
observe our system clearly in Figure 1
Fig 1 Our Dataset Building Process
A Data Collection
For this study, we crawled real estate advertisement posts
dated between August 2020 and September 2020 from three
different real estate websites in Vietnam, including:
• propzy.vn
• nhadat247.com.vn
• batdongsan.com.vn
B Data Processing
Before training models to identify named entities, we
pre-process real-estate posts to clean noisy data and follow a
standard format For detail, firstly, we remove posts that do
not contain clear and critical information about real estate,
then normalize Unicode text and particular Vietnamese word
mark position in words Then, we preprocess each string with
the following steps:
1) Remove meaningless characters in a string like \n, \r
(ASCII codes), emotion icons,
2) Split multiple joined words
3) Fix spaCy format issue by separating dots or commas
connected to a word However, we need to keep dots or
commas connected to a number (like 12.5, 1,000,000)
4) Replace multiple spaces with a single space in string
Table I show some examples for each case above
C Data Annotation
We use Doccano [10] as a tool for labeling this dataset and
meticulously define 16 entities containing the critical
infor-mation of a real estate sale advertisement The comprehensive
description of each entity type is briefly shown in Table II
TABLE I
S EVERAL EXAMPLES FOR THE DATA PROCESSING PHASE Case Raw Text Preprocessed Text
1 ^ -DT: \xa0195.35m2 ¨ ¨ ^ \n DT 195.35m2 DTSD
¨
^ -DTSD: 152.50m2 ¨ ^ 152.50m2
2 Đất Quận 6Đã lên thổ cư giá Đất Quận 6 Đã lên thổ cư 12,89tỷ và 1tỷ700tr giá 12,89 tỷ và 1 tỷ 700 tr
3 thuộc tầng 6 thuộc tầng 6
D Data Partitions
After annotating the dataset, we have 3152 real estate adver-tisements as a golden dataset We then split this dataset into training, validation, and testing sets with a ratio 60%, 20%, and 20%, respectively Statistics of our dataset is presented in Table III
TABLE III
S TATISTICS OF OUR DATASET
# Entity Type Num of Entities
1 district_name 3123
2 place_name 7471
3 transaction_type 3490
4 property_certificate 2515
5 property_type 10397
7 number_street_name 5415
10 province_city 1065
11 host_name 2079
12 ward_name 1254
14 direction 737
15 front_road 961
# Entities in total 53515
# Sentences in total 3152
IV METHODOLOGY
In this paper, we aim to investigate a named entity recogni-tion system for Vietnamese real estate documents This system can help users parse the possible real estate information field
of an advertisement automatically It is worth noting that such
a system is crucial and has become an indispensable tool
in the real estate market In what follows, we present the problem formula of our paper and how we extract features from real estate documents and train our proposed model for this problem
A Problem Formula
Our paper problem is that given a Vietnamese real estate advertisement, from 16 entities which we defined in section III-C our model will detect the entity of each word
B Feature extraction
This section describes the layers we use as feature extractors for real estate documents to measure the performance Firstly, we push the input data, which are annotated doc-uments about real estate, into a tokenizer to split sentences
Trang 3TABLE II
T HE NAME ENTITY DEFINITION OF A GIVEN REAL ESTATE MENTIONED IN ONE ADVERTISEMENT POST
Label Definition
district_name The district name where the real estate is located
place_name The name of one specific location, such as e.g one building, one shopping mall, or an airport,
transaction_type The transaction type of the real estate advertisement post, including sell, buy, or rent,
property_certificate The property certificate information of the real estate
property_type The property type of the real estate, such as e.g home, department, or land,
phone Phone number of a real estate
number_street_name The street name or the house number with the street name
area The area nearby the real estate
distance The distance, such as 10m, 20m, 300m, etc.
province_city The name of the province or city where the real estate locate
host_name The host name of the real estate
ward_name The ward name where the real estate is located
price The price related to real estate mentioned in the real estate
direction The house direction information of the real estate Example: East or West
front_road The front-road information of real estate
email The contacted email
into lists of words, including punctuation A settled number
of UTF-8 byte characters are utilized for each word We add
a token <padding> at the end of each list such that the length
of each list is equal We then put these numerical lists into
an Embedding layer name CharacterEmbed in spaCy [11] to
vectorize to matrices N × M (where N is the number of
words in each sentence and M is the figure of dimensions
representing a word) that represent the meaning of each
sentence
Next, we use one of the following four architectures to
perform feature extraction of real estate advertisements:
1) MaxoutWindowEncoder: MaxoutWindowEncoder is the
architecture that gets an embedding vector as an input Then,
this feature is pushed into a Convolution 1D layer with a
window size of 2 × 2 with the number of filters being 4 The
skip connection [12] adds embedding vector with features are
extracted by Convolution 1D layer After that, this information
goes through a Maxout activation function Finally, this feature
is normalized by BatchNormalization BatchNormalization has
the effect of avoiding overfitting and securing the model more
straightforward to converge In contrast, residual connections
help the model retains information before feature extraction
through the convolution layer One can see more detail in
Figure 2
Fig 2 The architecture of MaxoutWindowEncoder and MisWindowEncoder
2) MishWindowEncoder: This Encoder has an architecture
similar to MaxoutWindowEncoder However, the difference
here is that this Encoder utilizes Mish [13] as an activation
function instead of using Maxout, which one can find more clarity in the Figure 2 According to Diganta Misra, in 75 experimental tasks with various models (DenseNet, Inception v3, Xception Net), Mish outperforms ReLU in 55/75 tasks and overcome Swish in 53/75 tasks
3) LSTMEncoder: LSTM (Long Short Term Memory Net-works) [14] was first introduced by Hochreiter & Schmidhuber
in 1997 This architecture is a particular structure of RNN (Recurrent Neural Networks) proposed in 1982 by David Rumelhart [15] According to the authors, LSTM is designed
to resolve long-term dependency problems that can not store information of long string data and avoid vanishing or explod-ing gradient problems faced in RNN In our experiment, the embedding vector is passed LSTM network has the number of hidden states equal to N which is the number of words in the input sentence to get the extracted features
4) BiLSTMEncoder: BiLSTM ( Bidirectional Long Short Term Memory Networks) [16] is based on both LSTM [14] and BiRNN [17] This architecture is similar to LSTMEncoder in section IV-B3 However, instead of using one LSTM network, this approach includes two LSTM stacks on top of each other: One takes information forwards, whereas the other takes it backward BiLSTMs effectively enhance the quantity of data available to the network, improving the content available to the algorithm
Fig 3 The architecture of BiLSTMEncoder, if remove one LSTM network, the architecture will become LSTMEncoder
Trang 4C Modeling
spaCy API provides a powerful model for named entity
recognition task is called TransitionBasedParser As per the
authors of spacy, TransitionBasedParsing is an approach to
structured prediction where the task of predicting the structure
is mapped to a series of state transitions1 The authors claim
that TransitionBasedParsing currently is more superior and
quicker than Stanford’s CoreNLP [18] One can see more detail
by visiting spaCy’s blog2
In this experiment, after using one of four encoder ways
that we mention in Section IV-B to extract features from real
estate documents, we push this informative feature into the
TransitionBasedParser model to conduct entity recognition of
each word in the text One can observe in detail our end-to-end
pipeline in Figure 4
V EXPERIMENT
This paper runs all experiments on a computer with Intel(R)
Core(TM) i7 2CPUs running at 2.4GHz with 8GB of RAM
and an Nvidia GeForce RTX2080Ti GPU with 11GB VRAM
In the data processing step of this study, we use different
Python packages, including, NLTK3, and Regex4, as tools to
clean data Additionally, a package Scikit-learn5is applied as a
tool to split our dataset Finally, we use spacy [11] as a toolkit
for the problem named entity recognition
A Experiment Settings
We detail the Real Estate Information NER task for
Viet-namese with BIO labeling scheme (short for inside, outside,
beginning) that was presented by Ramshaw and Marcus in
1995 [19] In our experiment, we used four different Encoders
with two widths W = 64 and W = 300, which spaCy
define as the number of sentence’s input width, one can find
more information in their document6 From that, we have eight
combinations to measure the performance of the NER for the
real estate sale advertisements problem Figure IV displays the
setting of our pipeline
TABLE IV
T HE HYPER - PARAMETERS OF OUR MODELING PIPELINE
Hyper-parameters Values
Learning rate 0.001
Optimizer Adam with betabeta 1= 0.9,
2 = 0, 99
1 https://spacy.io/api/architectures
2 https://explosion.ai/blog/parsing-english-in-python
3 https://www.nltk.org/
4 https://regexr.com/
5 https://scikit-learn.org/stable/
6 https://spacy.io/api/architectures
B Performance Metrics
In this experiment, we choose Precision, Recall, and F1-Score as critical metrics in measuring the performance of our proposed models for each entity
P = T P
T P + F P, R =
T P
T P + F N, F 1 = 2 ×
P × R
P + R, (1) where P stands for Precision, R is the Recall, and F 1 is the F1-score T P denotes true positive, T N indicates true negative, F P and F N are false positive and false negative After that, we average each metric on all entities to calculate the general performance for our proposed approach
C Results
We compare the performance of four different backbones in-cluding: MaxoutWindowEncoder, MishoutWWindowEncoder, LSTM, and BiLSTM and each above method combine with width which is defined the input and output width and is recommended width = 64 or width = 300 by spaCy One can find more comprehensive in our experimental results in Figures V
In general, four feature extractors that we apply in our experiment have good initial results in terms of Precision, Recall, and F1-score Conversely, the lowest result in each measure is 0.8486, 0.8331, and 0.8450, respectively
Next, we compare four feature extractors using width = 64
in our dataset The experimental results show that using Win-dowEncoder has superior performance in all three measures
to LSTM and BiLSTM In other words, the performance
in terms of Precision, Recall of WindowEncoder are higher than variants of LSTM, especially in terms of F1-score, which is a critical metric in machine learning, the result of MaxoutWindowEncoder and MishWindowEncoder are 0.8775 and 0.8673, respectively Meanwhile, the F1-score of LSTM and BiLSTM are 0.8556 and 0.8450 correspondingly Interestingly, in the case of using four feature extractors with width = 300, the performance of WindowEncoder methods in three metrics including Precision, Recall, and F1-score once again overcome LSTM and BiLSTM This result makes it worth noting that using two WindowEncoder types always has
a better performance than LSTM and BiLSTM One possible reason is that using skip connection in WindowEncoder can help the model stabilize gradient updates by keeping much information from being lost by connecting from the previous layer to the following layers and skipping some intermediate layers Furthermore, the normalization layer allows faster train-ing and stabilization of deep neural networks by stabiliztrain-ing the distribution of layer inputs during training; as a result, the model is easier to converge Additionally, using Mish [13] as
an activation function can help model increase performance instead of utilizing Maxout This result is because the Mish activation function is bounded below so that it results in regularization effects and reduces overfitting Moreover, our best model from eight experiments is the model that uses MishWindowEncoder with width = 300 as a feature extractor,
Trang 5Fig 4 Our proposed data pipeline for named entity recognition problem
this approach can gain a result in terms of Precision, Recall,
and F1-score are 0.8914, 0.9237, and 0.9072 correspondingly
Finally, one can see more detail our experimental result
of each entity in terms of F1-score, Precision, and Recall in
Table VI, VII, and VIII All most of the performance of each
entity is pretty stable However, the entity ward_name is the
challenge for our model; our best model (MishWindowEncoder
with width = 300) has F1-score, Precision, and Recall are
0.7741 and 0.6738, respectively To put it differently, the ratio
of correctly predicting an entity is ward_name to the total
number of entities correct ward_name is just 0.6738 We aim
to solve this issue in the future
TABLE V
T HE AVERAGE RESULTS OF DIFFERENT METHODS
Methods Precision Recall F1-score
MaxoutWindowEncoder W64 0,8623 0.8933 0,8775
MishtWindowEncoder W64 0,8677 0,8669 0,8673
MaxoutWindowEncoder W300 0,8739 0,8871 0,8805
MishWindowEncoder W300 0,8914 0,9237 0,9072
BILSTM W300 0,8524 0,8549 0,8535
VI CONCLUSION
In this paper, we contribute a new dataset with 3152
ad-vertisements in Real Estate Information Named Entity
Recog-nition task for Vietnamese including 13 entities and propose
eight methods for measuring the initial performance in terms
of Recall, Precision, and F1-score We find out using
Mish-WindowEncoder has an experimental result that outperforms
total other techniques in all metrics In the future, we aim
to extend our results for different datasets and apply new
approaches to improve the proposed algorithms’ performance
ACKNOWLEDGMENTS
We want to thank the University of Science, Vietnam
Na-tional University in Ho Chi Minh City, Hung Thinh Corp., and
AISIA Research Lab in Vietnam for supporting us throughout
this paper This research is funded by Hung Thinh Corp under
grant number HTHT2021-18-01
REFERENCES [1] E F Tjong Kim Sang and F De Meulder, “Introduction to the
CoNLL-2003 shared task: Language-independent named entity recognition,”
in Proceedings of the Seventh Conference on Natural Language
Learning at HLT-NAACL 2003, 2003, pp 142–147 [Online] Available:
https://aclanthology.org/W03-0419
[2] H Nguyen, Q Ngo, L Vu, V Tran, and H Nguyen, “Vlsp shared
task: Named entity recognition,” Journal of Computer Science and
Cybernetics, vol 34, pp 283–294, 01 2019.
[3] T H Truong, M H Dao, and D Q Nguyen, “COVID-19 Named Entity
Recognition for Vietnamese,” in Proceedings of the 2021 Conference
of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
[4] P T X Thao, T Q Tri, D Dien, and N Collier, “Named entity
recognition in vietnamese using classifier voting,” ACM Transactions
on Asian Language Information Processing, vol 6, no 4, Dec 2008 [Online] Available: https://doi.org/10.1145/1316457.1316460 [5] H.-Q Le, M.-V Tran, N.-N Bui, N.-C Phan, and Q.-T Ha, “An integrated approach using conditional random fields for named entity
recognition and person property extraction in vietnamese text,” in 2011
International Conference on Asian Language Processing, 2011, pp 115– 118.
[6] V Nguyen Hong, H Nguyen, and V Snasel, “Text normalization for
named entity recognition in vietnamese tweets,” Computational Social
Networks, vol 3, 12 2016.
[7] L Viet-Thang and L K Pham, “Za-ner: Vietnamese named entity
recog-nition at vlsp 2018 evaluation campaign,” Proceedings of Vietnamese
Speech and Language Processing (VLSP), 2018.
[8] Q H Pham, M.-L Nguyen, B T Nguyen, and N V Cuong,
“Semi-supervised learning for Vietnamese named entity recognition
using online conditional random fields,” in Proceedings of the
Fifth Named Entity Workshop Beijing, China: Association for Computational Linguistics, Jul 2015, pp 50–55 [Online] Available: https://aclanthology.org/W15-3907
[9] L V Pham and S B Pham, “Information extraction for vietnamese
real estate advertisements,” in 2012 Fourth International Conference on
Knowledge and Systems Engineering, 2012, pp 181–186.
[10] H Nakayama, T Kubo, J Kamura, Y Taniguchi, and X Liang,
“doccano: Text annotation tool for human,” 2018, software available from https://github.com/doccano/doccano [Online] Available: https: //github.com/doccano/doccano
[11] M Honnibal, I Montani, S Van Landeghem, and A Boyd, “spacy: Industrial-strength natural language processing in python,” 2020 [Online] Available: https://doi.org/10.5281/zenodo.1212303
[12] K He, X Zhang, S Ren, and J Sun, “Deep residual learning for image
recognition,” in 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2016, pp 770–778.
[13] D Misra, “Mish: A self regularized non-monotonic activation function,”
in BMVC, 2020.
[14] S Hochreiter and J Schmidhuber, “Long short-term memory,” Neural
Comput., vol 9, no 8, p 1735–1780, Nov 1997 [Online] Available: https://doi.org/10.1162/neco.1997.9.8.1735
[15] D Rumelhart, G E Hinton, and R J Williams, “Learning
representa-tions by back-propagating errors,” Nature, vol 323, pp 533–536, 1986.
[16] A Graves and J Schmidhuber, “Framewise phoneme classification with
bidirectional lstm networks,” in Proceedings 2005 IEEE International
Joint Conference on Neural Networks, 2005., vol 4, 2005, pp 2047–
2052 vol 4.
[17] M Schuster and K Paliwal, “Bidirectional recurrent neural networks,”
IEEE Transactions on Signal Processing, vol 45, no 11, pp 2673–2681, 1997.
[18] C D Manning, M Surdeanu, J Bauer, J R Finkel, S Bethard, and D McClosky, “The stanford corenlp natural language processing
toolkit.” in ACL (System Demonstrations). The Association for Computer Linguistics, 2014, pp 55–60 [Online] Available: http: //dblp.uni-trier.de/db/conf/acl/acl2014-d.html#ManningSBFBM14
Trang 6TABLE VI
T HE F1- SCORE RESULTS OF DIFFERENT METHODS
Entity MaxoutWindow
EncoderW64
LSTM W64
MishWindow EncoderW64
BiLSTM W64
MaxoutWindow EncoderW300
LSTM W300
MishWindow EncoderW300
BiLSTM W300
district_name 0.8846 0.8267 0.8564 0.825 0.8853 0.8786 0.9256 0.8092
transaction_type 0.9104 0.8272 0.6539 0.8198 0.845 0.8171 0.8816 0.9291 property_certificate 0.7204 0.8239 0.815 0.9101 0.9311 0.9284 0.9367 0.6782 property_type 0.8362 0.9205 0.9285 0.6609 0.714 0.6595 0.7676 0.8256
number_street_name 0.8601 0.9814 0.9858 0.8109 0.988 0.9846 0.9923 0.8372
province_city 0.8128 0.9101 0.9371 0.7195 0.8772 0.8012 0.8619 0.9042
TABLE VII
T HE P RECISION RESULT OF DIFFERENT METHODS
Entity MaxoutWindow EncoderW64 LSTMW64 MishWindow EncoderW64 BiLSTM W64 MaxoutWindow EncoderW300 LSTM W300 EncoderW300 MishWindow BiLSTM W300
district_name 0.8831 0.7925 0.8507 0.8543 0.8779 0.8712 0.9419 0.8419
transaction_type 0.8831 0.8568 0.7366 0.8525 0.8718 0.8279 0.8787 0.9283 property_certificate 0.7128 0.7714 0.8204 0.9243 0.8954 0.8973 0.9073 0.6738 property_type 0.7758 0.8922 0.9353 0.6567 0.6963 0.6538 0.7634 0.8154
number_street_name 0.8745 0.9803 0.9868 0.8137 0.9869 0.989 0.9934 0.8823
TABLE VIII
T HE R ECALL RESULTS OF DIFFERENT METHODS
Entity MaxoutWindow EncoderW64 LSTMW64 MishWindow EncoderW64 BiLSTM W64 MaxoutWindow EncoderW300 LSTM W300 EncoderW300 MishWindow BiLSTM W300
district_name 0.8861 0.8639 0.8622 0.7976 0.8929 0.8861 0.9099 0.7789
transaction_type 0.9394 0.7996 0.588 0.7895 0.8199 0.8066 0.8846 0.9298 property_certificate 0.7283 0.8842 0.8096 0.8963 0.9697 0.9617 0.9681 0.6826 property_type 0.9067 0.9506 0.9219 0.6652 0.7326 0.6652 0.7717 0.836
number_street_name 0.8462 0.9825 0.9847 0.8082 0.9891 0.9803 0.9912 0.7966
province_city 0.8736 0.9162 0.9347 0.7299 0.8621 0.7874 0.8966 0.9054
[19] L Ramshaw and M Marcus, “Text Chunking Using
Transformation-Based Learning,” in Proceedings of the Third ACL Workshop on Very
Large Corpora, 1995.