1. Trang chủ
  2. » Luận Văn - Báo Cáo

Transductive support vector machines for cross lingual sentiment classification

4 8 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 4
Dung lượng 221 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Transductive Support Vector Machines for Cross-lingual Sentiment Classification Nguyen Thi Thuy Linh Faculty of Information Technology University of Engineering and Technology Vietnam Na

Trang 1

Transductive Support Vector Machines for Cross-lingual Sentiment Classification

Nguyen Thi Thuy Linh

Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi

Supervised by Professor Ha Quang Thuy

A thesis submitted in fulfillment of the requirements for the degree of

Master of Computer Science

December, 2009

Trang 2

Table of Contents

1.1 Introduction 1

1.2 What might be involved? 3

1.3 Our approach 3

1.4 Related works 4

1.4.1 Sentiment classification 4

1.4.1.1 Sentiment classification tasks 4

1.4.1.2 Sentiment classification features 4

1.4.1.3 Sentiment classification techniques 4

1.4.1.4 Sentiment classification domains 5

1.4.2 Cross-domain text classification 5

2 Background 6 2.1 Sentiment Analysis 6

2.1.1 Applications 7

2.2 Support Vector Machines 7

2.3 Semi-supervised techniques 10

2.3.1 Generate maximum-likelihood models 10

2.3.2 Co-training and bootstrapping 11

2.3.3 Transductive SVM 11

3 The semi-supervised model for cross-lingual approach 13 3.1 The semi-supervised model 13

3.2 Review Translation 16

3.3 Features 16

3.3.1 Words Segmentation 16

3.3.2 Part of Speech Tagging 18

3.3.3 N-gram model 18

ii

Trang 3

TABLE OF CONTENTS iii

4.1 Experimental set up 20

4.2 Data sets 20

4.3 Evaluation metric 22

4.4 Features 22

4.5 Results 23

4.5.1 Effect of cross-lingual corpus 23

4.5.2 Effect of extraction features 24

4.5.2.1 Using stopword list 24

4.5.2.2 Segmentation and Part of speech tagging 24

4.5.2.3 Bigram 25

4.5.3 Effect of features size 25

5 Conclusion and Future Works 28

Trang 4

Abstract Sentiment classification has been much attention and has many useful applications

on business and intelligence This thesis investigates sentiment classification prob-lem employing machine learning technique Since the limit of Vietnamese sentiment corpus, while there are many available English sentiment corpus on the Web We combine English corpora as training data and a number of unlabeled Vietnamese data in semi-supervised model Machine learning eliminates the language gap be-tween the training set and test set in our model Moreover, we also examine types

of features to obtain the best performance

The results show that semi-supervised classifier are quite good in leveraging cross-lingual corpus to compare with the classifier without cross-lingual corpus In term of features, we find that using only unigram model turning out the outperfor-mace

Ngày đăng: 16/03/2021, 12:31

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN