Towards a framework for building an Annotated Named Entities Corpus IHỌC : INGHE Hoang Huu Son Faculty of Information Technology University of technology and engineering Vietnam Nati
Trang 1Towards a framework for building an Annotated Named Entities Corpus
IHỌC :
INGHE
Hoang Huu Son
Faculty of Information Technology University of technology and engineering Vietnam National University, Hanoi
Supervised by
Doctor Pham Bao Son
A thesis submitted in fulfillment of the requirements for the degree of
Master of Information Technology
June, 2010
Trang 2Table of Contents
1 Introduction
11
13
18
14
21
22
23
24
25
31
33
34
Overview Name Entity recognition{NIR) NER Approach
12.1 Rule based approach 1.2.2 Machine learning Approach
12.3 Comparing
‘Thesis contribution Thesis strneture
Nelated Work
Overview onr problem
Tuilding NER corpus research
Researches aboul building corpus Process Overview annotate toals
Summary
Corpus building process
Corpus building proces BLL Objective
3.1.2 Tuilt annotation enide line
3.13 Annotate documents
3.14 Quality control
Building Vietnamese NER corpus by off-line tools 3.2.1 Tuilt annotation gnide line
3.2.2 Annotate documents
32.3 Quality control Diweus about Vieuuumese NER corpus building proce -
Conclusion
13
13
18
14
16
1
20
30
34
36
sự 27
Trang 3
‘TABLE OF CONTENTS iii
4 Onlinc Annotation Framework 28
41 Intraduction 38
42 ‘Lraining section 20
43 Annotation documents 3u
431 Onlme annotatonintefaee co 31 43.2 Automate file distribution for annobatdr 3
43.3 Automate save end managefles cv 33
5 Evaluation 3p
5.2.1 Inter annotatctor agreements 2 ee 41
52.2 Ollline corpus evaluation v.v và và 42
5.3 Time costing 4
531 Overview 4
5.33 Online framewerk 40
5.4.2 Garetteer 54 5.43 Transducer 54
6 Conclusion And Future work 60
6.2 Luture work - - ˆ 2
62.1 Create corpus bigger and more quality g3
6.2.2 Improve online annotation framework g3 62.3 Building NER, system base statistical - g3
Trang 4iv ‘TABLE OF CONTENTS
A Name Entity guideline
Ad
Tiasic: concepts : :
ALL Euily and Emily Name A.L2 Instance of emily
A13 List of Entities A.1.4 Entities reengnize rites
Entity classification
ÄA.21 Pemon
Organization A23 Location
A24 Tacility
A225 Religion
Trang 5
‘Taward a Framework for building Named Entity Corpus
Hoang Huu Son University of Engineering and ‘Technology Vietnam National University, Hanoi
144, Xuan Thuy, Cau Giay, Hanoi, VicInam
Abstract Named envities recagnition (NER) problem is one of the
most interesting in nature language processing domain
However a main NER research barrier ts difficult t0 build
@ NER corpus and there is any NER corpus have been pub-
lished So that in the thesis, we release « corpus building
process nnd frameworks in Iuild NER corpus - special Viet
namnese named entity corpus
1 Introduction
Please be noted some points as follows
“The context of the research and ils role/immpurtance
- Related studies and their methods/solutions/approaches
= The remain problems and objective of this study/thesis
~ Your proposul, What will be carried out?
2
~ You can arrange onc or more scetions after tho Lntro-
ductivo,
~ You can use subsections
= Show hovr the problem are formulated You may give
some foundations if ncecssary
- Show different aspects of the problems, for exumples:
the feature selections, learning algorithms, etc
= Show your proposal, it i+ good if you can present the
differences belween your proposal an previous studies 1
is also important to show/analyze the solution in a reason-
able way
- Show how features are selected/buils; the algo-
rithms/methods you wiil use
3 Experiments
‘You should give the information as follows: Kravalovi,
Jana and Zabokrisky, Zdentk have buill Czech Named En-
tity Corpus which present in paper [7] In this recently
released corpus of Czech sentences with manually anno- tated named entities, in which a rich two-level classification scheme was used, - How are the models designed? You can design ditterent models/parameters, so please describe them in detail
~ How are the data prepared?
~ The resatts should be presented in Tables and Graphs
- It is important af giving the discussion after obtaining, experimental results,
4, Conclusions
- With regard th the objective af this study as you showed
in the introduction, which have been done?
~The contribution of your work, the meaning of obtained results
~ Present future work if needed,
Publications
~ Givehere your publications during this master course
~ You can also give here your submission and its stilus
Ge, submitted, revising, in press, )
References
[1] 1.M, Author, Some related article I wrote, Some Fine Journal, 97}: 100, January 1999
[2] A.N Expert A Book He Wrote, [lis Publisher, Erewhon, NC, 999.