1. Trang chủ
  2. » Thể loại khác

DSpace at VNU: Lexicalized statistical parsing for Vietnamese

4 94 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 4
Dung lượng 162,91 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

On the parameter space of generative lexicalized statistical parsing models.. Improving generative statistical parsing with semi-supervised word clustering.. IWPT '09: Proceedings of th

Trang 1

Lexicalized statistical parsing for Vietnamese

Phạm Thị Minh Thu

Trường Đại học Công nghệ Luận văn Thạc sĩ ngành: Công nghệ thông tin Người hướng dẫn: TS Lê Anh Cường

Năm bảo vệ: 2010

Keywords: Phân tích thống kê; Từ vựng; Cú pháp; Tiếng Việt; Tin học Content

Table of Contents

1.1 What is syntactic parsing? 1

1.2 Current Studies in Parsing 3

1.3 Vietnamese syntactic parsing 4

1.4 Objective of the Thesis 5

1.5 Thesis structure 6

2 Parsing approaches 7 2.1 Context Free Grammar (CFG) 7

2.2 Parsing Algorithms 8

2.2.1 Top-down parsing 8

2.2.2 Bottom-up parsing 8

2.2.3 Comparison between top-down parsing and bottom-up parsing 9

2.2.4 CYK algorithm (Cocke-Younger-Kasami) 9

2.2.5 Earley algorithm 11

2.3 Probabilistic context-free grammar (PCFGs) 13

2.3.1 The concept of PCFG 13

2.3.2 Disadvantages of PCFGs 14

2.4 Lexical Probabilistic Context Free Grammar (LPCFGs) 15

Trang 2

2.4.1 Head structure 16

2.4.2 The concept of Lexical Probabilistic Context Free Grammar (LPCFGs) 16 2.4.3 Three models of Collins 18

3 Vietnamese parsing and our approach 21 3.1 Vietnamese characteristics 21

3.1.1 PennTreeban k 22POS tagging 23

3.1.2 Bracketing 23

3.2 Viet Treebank 25

3.2.1 Objectives 25

3.2.2 The POS tagset and Syntax tagset for Vietnamese 27

3.3 Our approach in building a Vietnamese parser 27

3.3.1 Adapting Bikel's parser for Vietnamese 29

3.3.2 Analyze error and propse using heuristic rules 30

4 Experiments and Discussion 33 3.4 Data 33

3.5 Bikel's parsing tool 34

3.6 Adaptating Bikel's tool to Vietnamese 35

3.6.1 Investigate different configurations 35

3.6.2 Training 38

3.6.3 Parsing 39

3.6.4 Evaluation of the parser 39

3.6.5 Results 40

3.7 Experimental results on using heuristic rules 42

5 Conclusions and Future Work 46 3.8 Summary 46

3.9 Contribution 46

3.10 Future work 47

Trang 3

References:

1 Agirre, E., & Baldwin, T (2008) Improving parsing and PP attachment performance

with sense information Proceedings of ACL-08: HLT (pp 317—325) Columbus, Ohio:

Association for Computational Linguistics

2 Anh-Cuong, L., Phuong-Thai, N., Hoai-Thu, V., Minh-Thu, P., & Tu-Bao, H (2009)

Experimental study on lexicalized statistical parsing for Vietnamese KSE 2009

Inter-national Conference on Knowledge and Systems Engineering (pp 162—167) Hanoi,

Vietnam

3 Bikel, D M (2004) On the parameter space of generative lexicalized statistical parsing

models Doctoral dissertation, Philadelphia, PA, USA Supervisor-Marcus, Mitchell P

4 Candito, M., & Crabbe, B (2009) Improving generative statistical parsing with

semi-supervised word clustering IWPT '09: Proceedings of the 11th International Conference

on Parsing Technologies (pp 138—141) Morristown, NJ, USA: Association for

Computational Linguistics

5 Carreras, X., Collins, M., & Koo, T (2008) Tag, dynamic programming, and the per-

ceptron for efficient, feature-rich parsing CoNLL '08: Proceedings of the Twelfth

Conference on Computational Natural Language Learning (pp 9—16) Morristown, NJ,

USA: Association for Computational Linguistics

6 Collins, M (1997) Three generative, lexicalised models for statistical parsing ACL-35: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (pp 16—23) Morristown, NJ, USA: Association for Computational

Linguistics

7 Collins, M (1999) Head-driven statistical models for natural language parsing

Doctoral dissertation, University of Pennsylvania

8 Collins, M (2003) Head-driven statistical models for natural language parsing

Compu-tational Linguistics, 29, 589—637

9 Manning, C D., & Schutze, H (1999) Foundations of statistical natural language

processing Cambridge, MA: MIT Press

10 Phuong-Thai, N., & Xuan-Luong, V (2009) Building a large syntactically-annotated

corpus of Vietnamese Proceedings of the Third Linguistic Annotation Workshop (pp

Trang 4

182—185) Suntec, Singapore: Association for Computational Linguistics

11 Quoc-The, N., & Thanh-Huong, L (2008) Vietnamese syntactic parsing using the lexi-

calized probabilistic context-free grammar Proceedings of FAIR conference 2007 (pp

9—10) Nha Trang, Vietnam

12 Rafferty, A N., & Manning, C D (2008) Parsing three german treebanks: lexicalized

and unlexicalized baselines PaGe '08: Proceedings of the Workshop on Parsing German

(pp 40—46) Morristown, NJ, USA: Association for Computational Linguistics

13 Watson, R., Briscoe, T., & Carroll, J (2007) Semi-supervised training of a statistical

parser from unlabeled partially-bracketed data IWPT '07: Proceedings of the 10th

International Conference on ParsingTechnologies (pp 23—32) Morristown, NJ, USA:

Association for Computational Linguistics

14 Xiong, D., Li, S., Liu, Q., Lin, S., & Qian, Y (2005) Parsing the penn Chinese treebank

with semantic knowledge In Proceedings ofIJCNLP 2005 (pp 70—81)

Ngày đăng: 18/12/2017, 03:03

TỪ KHÓA LIÊN QUAN