1. Trang chủ
  2. » Thể loại khác

DSpace at VNU: Transductive support vector machines for cross-lingual sentiment classification

4 122 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 4
Dung lượng 127,79 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Transductive support vector machines for cross-lingual sentiment classification Nguyễn Thị Thùy Linh Trường Đại học Công nghệ Luận văn Thạc sĩ ngành: Khoa học máy tính; Mã số: 60 48 01

Trang 1

Transductive support vector machines for cross-lingual sentiment classification

Nguyễn Thị Thùy Linh

Trường Đại học Công nghệ Luận văn Thạc sĩ ngành: Khoa học máy tính; Mã số: 60 48 01

Người hướng dẫn: PGS.TS Hà Quang Thụy

Năm bảo vệ: 2009

Abstract Sentiment classification has been much attention and has many useful

applications on business and intelligence This thesis investigates sentiment classification problem employing machine learning technique Since the limit of Vietnamese sentiment corpus, while there are many available English sentiment corpus on the Web We combine English corpora as training data and a number of unlabeled Vietnamese data in semi-supervised model Machine learning eliminates the language gap between the training set and test set in our model Moreover, we also examine types of features to obtain the best performance The results show that semi-supervised classifier are quite good in leveraging cross-lingual corpus to compare with the classifier without cross-lingual corpus In term of features, we find

that using only unigram model turning out the outperformace

Keywords Khoa học máy tính; Công nghệ thông tin; Dữ liệu; Ngôn ngữ

Content

Table of Contents

1.1 Introduction 1

1.2 What might be involved? 3

1.3 Our approach 3

1.4 Related works 4

1.4.1 Sentiment classification 4

1.4.1.1 Sentimentclassification tasks 4

1.4.1.2 Sentimentclassification features 4

1.4.1.3 Sentimentclassification techniques 4

Trang 2

B 32

1.4.1.4 Sentimentclassificationdomains 5

1.4.2 Cross-domain text classification 5

2 Background 6 2.1 Sentiment Analysis 6

2.1.1 Applications 7

2.2 Support Vector Machines 7

2.3 Semi-supervised techniques 10

2.3.1 Generate maximum-likelihood models 10

2.3.2 Co-training and bootstrapping 11

2.3.3 Transductive SVM 11

3 The semi-supervised modelfor cross-lingual approach 13 3.1 The semi-supervised model 13

3.2 Review Translation 16

3.3 Features 16

3.3.1 Words Segmentation 16

3.3.2 Part of Speech Tagging 18

3.3.3 N-gram model 18

4 Experiments 20

4.1 Experimental set up 20

4.2 Data sets 20

4.3 Evaluation metric 22

4.4 Features 22

4.5 Results 23

4.5.1 Effect of cross-lingual corpus 23

4.5.2 Effect of extraction features 24

4.5.2.1 Using stopword list 24

4.5.2.2 Segmentation and Part of speech tagging 24

4.5.2.3 Bigram 25

4.5.3 Effect of features size 25

5 Conclusion andFuture Works 28 A 30

References

Blitzer, J., Dredze, M., & Pereira, F (2007) Biograpies, bollywood, boom-boxes and

Trang 3

blenders: domain adaptation for sentiment classification In Proceedings of ACL

Blum, A., & Mitchell, T (1998) Combining labeled and unlabeled data with cotraining

Proceedings of COLT-98

Dan, N D (1987) Logic of syntatic Hanoi: University and College Publisher

Efron, M (2004) Cultural orientation: Classifying subjective documents by co- ciation

analysis Proceedings of the A A A I Fall Symposium Series on Style and Meaning in

Language, Art, Music and Design

Gamon, M., Aue, A., Corston-Oliver, S., & Ringger, E (2005) Pulse: Mining customer

opinions from free text Advances in Intelligent Data Analysis VI (pp 121-132)

Hu, M., & Liu, B (2004a) Mining and summarizing customer reviews Proceedings of the

2004 ACM SIGKDD international conference on Knowledge discovery and data mining

(pp 168-177) New York, NY, USA: ACM Press

Hu, M., & Liu, B (2004b) Mining opinion features in customer reviews Proceedings of

Nineteenth National Conference on Artificial Intelligence (pp 755-760) San Jose, USA

Joachims, T (1998) Text categorization with support vector machines: Learning with many

relevant features Proceedings of the European conference on Machine Learning (ECML)

Joachims, T (1999) Transductive inference for text classification using support vector

machines Proceedings of ICML

Trang 4

Linh, N T T (2006) Classification vietnamese webpages with independent language

Mullen, T., & Collier, N (2004) Sentiment analysis using support vector machines with

diverse information sources Proceedings of the EMNLP

Nigram, K., McCallum, A K., Thrun, S., & Mitchell, T (2000) Text classification from

labeled and unlabeled documents using em Machine Learning

Pang, B., & Lee, L (2004) A sentiment education: sentiment analysis using subjectivity

summarization based on minimum cuts Proceedings of the ACL

Pang, B., & Lee, L (2008) Opinion mining and sentiment analysis

Pang, B., Lee, L., & Vaithyanathan, S (2002) Thumbs up? sentiment classification using

machine learning techniques Proceedings of the ACL

Tu, N C., Nguyen, T.-K., Phan, X.-H., Nguyen, L.-M., & Ha, Q.-T (2006) Vietnamese

word segmentation with crfs and svms: An investigattion Proceedings of the Pacific Asia

Conference on Language, Information and Computation

(PACLIC)

Turney, P D (2002) Thumbs up or thumbs down? semantic orientations applied to

unsupervised classification of reviews In Proceedings of ACL

Turney, P D., & Littman, M L (2002) Unsupervised learning of semantic orientation from

a hundred-billion-word corpus

Vapnik (1998) Statistical learning theory Wiley

VLSP (2009) http://vlsp.vietlp.org:8080/demo/?page=home

Wan, X (2008) Using bilingual knowledge and ensemble techniques for unsupervised

chinese sentiment analysis Proceedings of the 2008 conference on Empirical Methods in

Natural Language Processing (pp 553-561) Honolulu

Wan, X (2009) Co-training for cross-lingual sentiment classification Proceedings of the

47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP (pp 235-243) Suntec,

Singapore

Ngày đăng: 18/12/2017, 03:04