1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn towards a framework for building an annotated named entities corpus

5 1 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Towards a Framework for Building an Annotated Named Entities Corpus
Tác giả Hoang Huu Son
Người hướng dẫn Doctor Pham Bao Son
Trường học University of Engineering and Technology Vietnam National University, Hanoi
Thể loại Thesis
Năm xuất bản 2010
Thành phố Hanoi
Định dạng
Số trang 5
Dung lượng 52,68 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Towards a framework for building an Annotated Named Entities Corpus IHỌC : INGHE Hoang Huu Son Faculty of Information Technology University of technology and engineering Vietnam Nati

Trang 1

Towards a framework for building an Annotated Named Entities Corpus

IHỌC :

INGHE

Hoang Huu Son

Faculty of Information Technology University of technology and engineering Vietnam National University, Hanoi

Supervised by

Doctor Pham Bao Son

A thesis submitted in fulfillment of the requirements for the degree of

Master of Information Technology

June, 2010

Trang 2

Table of Contents

1 Introduction

11

13

18

14

21

22

23

24

25

31

33

34

Overview Name Entity recognition{NIR) NER Approach

12.1 Rule based approach 1.2.2 Machine learning Approach

12.3 Comparing

‘Thesis contribution Thesis strneture

Nelated Work

Overview onr problem

Tuilding NER corpus research

Researches aboul building corpus Process Overview annotate toals

Summary

Corpus building process

Corpus building proces BLL Objective

3.1.2 Tuilt annotation enide line

3.13 Annotate documents

3.14 Quality control

Building Vietnamese NER corpus by off-line tools 3.2.1 Tuilt annotation gnide line

3.2.2 Annotate documents

32.3 Quality control Diweus about Vieuuumese NER corpus building proce -

Conclusion

13

13

18

14

16

1

20

30

34

36

sự 27

Trang 3

‘TABLE OF CONTENTS iii

4 Onlinc Annotation Framework 28

41 Intraduction 38

42 ‘Lraining section 20

43 Annotation documents 3u

431 Onlme annotatonintefaee co 31 43.2 Automate file distribution for annobatdr 3

43.3 Automate save end managefles cv 33

5 Evaluation 3p

5.2.1 Inter annotatctor agreements 2 ee 41

52.2 Ollline corpus evaluation v.v và và 42

5.3 Time costing 4

531 Overview 4

5.33 Online framewerk 40

5.4.2 Garetteer 54 5.43 Transducer 54

6 Conclusion And Future work 60

6.2 Luture work - - ˆ 2

62.1 Create corpus bigger and more quality g3

6.2.2 Improve online annotation framework g3 62.3 Building NER, system base statistical - g3

Trang 4

iv ‘TABLE OF CONTENTS

A Name Entity guideline

Ad

Tiasic: concepts : :

ALL Euily and Emily Name A.L2 Instance of emily

A13 List of Entities A.1.4 Entities reengnize rites

Entity classification

ÄA.21 Pemon

Organization A23 Location

A24 Tacility

A225 Religion

Trang 5

‘Taward a Framework for building Named Entity Corpus

Hoang Huu Son University of Engineering and ‘Technology Vietnam National University, Hanoi

144, Xuan Thuy, Cau Giay, Hanoi, VicInam

Abstract Named envities recagnition (NER) problem is one of the

most interesting in nature language processing domain

However a main NER research barrier ts difficult t0 build

@ NER corpus and there is any NER corpus have been pub-

lished So that in the thesis, we release « corpus building

process nnd frameworks in Iuild NER corpus - special Viet

namnese named entity corpus

1 Introduction

Please be noted some points as follows

“The context of the research and ils role/immpurtance

- Related studies and their methods/solutions/approaches

= The remain problems and objective of this study/thesis

~ Your proposul, What will be carried out?

2

~ You can arrange onc or more scetions after tho Lntro-

ductivo,

~ You can use subsections

= Show hovr the problem are formulated You may give

some foundations if ncecssary

- Show different aspects of the problems, for exumples:

the feature selections, learning algorithms, etc

= Show your proposal, it i+ good if you can present the

differences belween your proposal an previous studies 1

is also important to show/analyze the solution in a reason-

able way

- Show how features are selected/buils; the algo-

rithms/methods you wiil use

3 Experiments

‘You should give the information as follows: Kravalovi,

Jana and Zabokrisky, Zdentk have buill Czech Named En-

tity Corpus which present in paper [7] In this recently

released corpus of Czech sentences with manually anno- tated named entities, in which a rich two-level classification scheme was used, - How are the models designed? You can design ditterent models/parameters, so please describe them in detail

~ How are the data prepared?

~ The resatts should be presented in Tables and Graphs

- It is important af giving the discussion after obtaining, experimental results,

4, Conclusions

- With regard th the objective af this study as you showed

in the introduction, which have been done?

~The contribution of your work, the meaning of obtained results

~ Present future work if needed,

Publications

~ Givehere your publications during this master course

~ You can also give here your submission and its stilus

Ge, submitted, revising, in press, )

References

[1] 1.M, Author, Some related article I wrote, Some Fine Journal, 97}: 100, January 1999

[2] A.N Expert A Book He Wrote, [lis Publisher, Erewhon, NC, 999.

Ngày đăng: 21/05/2025, 20:33

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN