1. Trang chủ
  2. » Luận Văn - Báo Cáo

Luận văn a feature based opinion mining model on product reviews in vietnamese

27 1 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề A Feature-Based Opinion Mining Model on Product Reviews in Vietnamese
Tác giả Vu Tien Thanh, Ha Quang Thuy
Người hướng dẫn Assoc. Prof. Ha Quang Thuy
Trường học Vietnam National University, Hanoi University of Engineering and Technology
Chuyên ngành Information Technology
Thể loại Thesis
Năm xuất bản 2012
Thành phố Hanoi
Định dạng
Số trang 27
Dung lượng 543,22 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOCY YU TIEN THANH A FEATURE-BASED OPINION MINING MODEL ON PRODUCT REVIEWS IN VIETNAMESE MASTER THESIS OF INFO

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOCY

YU TIEN THANH

A FEATURE-BASED OPINION MINING

MODEL ON PRODUCT REVIEWS IN

VIETNAMESE

MASTER THESIS OF INFORMATION TECITNOLOGY

Hanoi — 2012,

Trang 2

VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOCY

VU TIEN THANH

A FEATURE-BASED OPINION MINING

MODEL ON PRODUCT REVIEWS IN

VIETNAMESE

Major : Computer Science Codc : 60 48 01

MASTER THESIS OF INFORMATION TECHNOLOGY

Supervisor: Assoc.Prof Ha QuangThuy

Hanoi — 2012

Trang 3

21 (Opinion Ming sens wicisw & vieea w Qa w BE Ww Ras HBR we 4

2.1.2 The basic concepts in the opinion mining field 7

2.1.3 Opinion mining problems 0.006000 ee 9

G1 Introdustion:<: = 00s Rw SSUES SSeS eG MAMAS Res BS 15

3.2.2 Token Segmenting and POS Tagging 17

3.3 Phase 2: Product Features and Opinion Words Extraction 18

Explicit Product Features Extraction 18

Opinion word Extraetion

Implicit Features identification

Trang 4

44 The Whole System Evaluallon v.v ẶY a 3

Trang 5

A FEATURE-BASED OPINION MINING MODEL ON PRODUCT

REVIEWS IN VIETNAMESE

K16 Computer Science Master Course Faculty of Information Technology

Faculty of information Technology University of Engineering und Technology

University af Engineering and Technology Vietnam National University, Hanoi

Vietnam Nationul University, Hanvi thuyhq@ vaneda.yn

tienthanh_dhen@ vnuedu.vn

Abstract

Feature-based opinion mining and summarizing (FOMS) of reviews is a very in-

lcresting and allracting issue in the opinion mining ficld With the development of c-

commerce in Vietnam, there are more and more commercial sites and technical forums

where people can review or express their opinions on the products which they have

used As a result, the number of reviews has been increasing rapidly to hundreds

or even thousands for a hol-product in recent years Not only makes il difficult for

the customer to read them ta make a decision whether to buy product but difficult

for the producer to handle customer's opinions to improve their products as well In

this thesis, we describe a Heature-based apinion mining and summarizing model on

Vietnamese customer reviews Experimental results on Viewamese reviews of mobile

phone products domain demonstrate the effectivencas of the model

Keywords

feature-word; feature-based opinion mining system; opinion summarization; opinion-

word, reviews; syntax rules; VietSentiWordnet dictionary

PUBLICATIONS

+ Lluyen-Irang Pham, Tien:Thanh Vu, Mai-Vo ‘Tran and Quang-Thuy La A Solution for Grouping Vietnamese Synonym Feature Words in Product Reviews In Proceedings of the 6th international conference on Asia-Pacific Services Computing (APSCC 2011)

+ Quang-Thuy Lla, Tien-Thanh Vụ, Huyen-lrang Pham and Cong-Io Luu An Upgrading Feature-

bascd Opiniun Mining Mudcl on Vielnamese Product Reviews In Proceedings of the 7th international

conference on Active media technology (AMT 2011), pp 173-185

+ Tien-Thanh Vu, Huyen-Trang Pham, Cong-To T.uu and Quang-Thuy Ila A Fealure-Based Opinion

Mining Model on Product Reviews in Victnamesc In Semantic Methods for Knowledge Management

and Communication (SC1 381), pp 23-33

Trang 6

I INTRODUCTIOW

Foature-based opinion mining and summarizing(FOMS) of product reviewsis a very interesling

and attracting issue in the opinion mining field [1][2][3][4] There are many research have done

for improving FOMS systems [5]I3][2]

In this thesis, we propose a Feature-based opinion mining and summarizing model on Viet-

namese customer reviews overcuming some drawbacks of the recent FOMS systems Wilh an

input customer reviews set of products, our task is performed into four steps:(1)Pre-processing

the input customer reviews by slandardizing reviews, segmenting Token, and POS lagging(2)

extracting explicit product features and opinion-words as well by using Vietnamese syutax rules, identifying implicit product features by using relationships with opinion words,and automatically

grouping synonym product features by combining HAC clustering method and semi-supervised SVM-KNN classification method; (3) identifying opinion sentences in each review and deciding

whether cach opinion senicnce is posilive, negative or neutral by using ä VictSemiWurdNcL

extended from an initial SentiWordNet 3.0; (4) summarizing the results

Tho rest of this thesis is organized as following In the second chapter, we provide some

literature reviews In next chapter, the [OMS model with four steps is described Experiment

results and remarks arc described in the fourth chapter Conclusions arc showed in the last chapter

Tl RELATED WORKS Because positive opinionated document on a particular abject does not mean that the author

has positive opinions on all features of the object and vice versa In a typical opinionated text, the

author writes both positive and negative featurcs ef the ebject, although the general sentiment on the object may be positive or negative Document-level and sentence-level classification do not

provide such information Thus, feature-based opinion mining is needed to determine positive,

negative or neutral opinions the feature level And the feature-based opinion mining focuses on

(wo main (asks [6]:

+ Identify object features(product features) For example, in the sentence “The touch screen

of this mobile phone is great”, the product feature is touch screen

+ Determine orientation of opinions on features (positive, negative, or neutral) In above

sentence, the opinion on “touch screen” is positive

A Features Extraction

The approach applied in early feature-based opinion mining systems to identify features is

based on association mining [7] The main idea of this approach is thal aldough different customers usually have different reviews related to product features, when they comment on

preduct features, the words that they use to express [he feature are consistent Thus, the approach

uses assaciation mining to find nown/noun phrases (N/NP) that frequently occur in reviews and

Trang 7

considers those N/NP as product features, A disadvantage of the association mining based

approach is that it docs not identify implicit features

Other related works on feature extraction mainly use the topic modeling and clustering to

extract topics/features in customer reviews [8] The main idea of these approaches is that it

clusters the synonym features based on context of reviews

B Opinion Orientation Identification

Opinion Words Extraction The first approach applied to extract opinion words is based an

syntactic or co-occurrence patterns and also a seed list of opinion words to find other opinion

words in a large corpus [9] The approach starts with a list of seed opinion adjectives, and uses

them and a set of linguistic constraints such as “AND”, “OR”, “BUT” etc to identify additional

adjcetive opinion words and their orientations (positive, negative, or neutral) For cxample, gi

a sentence ‘This car is beautiful and spacious.” if “beautiful” is known to be positive, it can be

inferred thal “spacious” is also positive

Other approaches are based on dictionary, one of the simple techniques in this approach is

based on bootstrapping using « small set of sced opinion words and an online dictionary, c.g.,

WordNet [7][10] The approach firstly collects a smal] set of opinion words manually with

known orientations and then (o grow this sel by searching in the WordNet for their synonyms and antonyms After that, the newly found words are added to the seed list The next iteration saris, The iterative prucess slops when no more new words are found

n

Aggregating opinions: This step applies an opinion aggregation function to the resulting

opinion scores to determine the final orientation of the opinion on each object feature in the sentence Let the sentence be s, which contains a set of object features f\, , fm and a set of

opinion words or phrases op;, ,o, with their opinion scores obtained previous steps The

opinion orientation on cach feature f; in « is determined by the opinion aggregation function

(different functions on different systems) [6] defines the function as follows:

~ oP;

scored fir )- 2 Tang:

where op; is an opinion word in s, d(op;, A) is the distance between feature fi and opinion word op; in s op;.so is the orientation or the opinion score 0Ÿ ap¡.

Trang 8

I Our FEATURE-BASED Opinion Mininc MopEL

A, Introduction

Figure 1 describes the proposed model for feature-based opinion mining and summarizing on

Vietnamese product reviews The system performs four following phases: (1)Pre-processing (2)

extracling explicit/implicil product features and upinion-words, and grouping synonym product

features(3) identitying orientation of opinion(4) summarizing the results Each step is imple-

mented by several modules

Phase 4; Results Suenmmarization

gL] VictSeatiWordnct

Wemamse custarmter revicws

1) Data Standardizing: The customer often uses a combination of standard spelling, apparently

accidental mistakes, slang, sentence fragments, “typographic slang” and interjoctions in their

reviews [11] We adopted a Vietnamese accented system combined N-gram statistic model

and Hidden Markov modcl(HMM) for the purpose of converting a sentence without acecnts into a Vietnamese accented sentence, for example,“Chiec camera nay that tien loi” switched

into “Chiée camera may that Uign lyi?_(This camera is convenient) The customer oficn uses

a combination of standard spelling, apparently accidental mistakes, slang, sentence fragments,

“.ypographic slang” and interjections in (heir reviews [11] Therefare, we adopled a Viewamese

Trang 9

accenled syslem combined N-gram siatisic model and Ilidden Markov modelqIMM) for the

purpose of converting a sentence without accents into a Vietnamese accented sentence, for example,“Chiec camera nay that tien Joi” switched into “Chiée camera nay that tiện

lg?_(Thix camera is comvenient)

2) Token Seymenting und POS Taxging: Because the product features are often nouns or noun

phrases constructing from a bag of words, they nced to be scgmented and tagged In order

to obtain that goal, we use Vietnamese word segmentation tool [12] For example, given a

review sentence: “C4e tinh nang néi chung Ki dit/Features are generally good.), Alter token

segmenting and POS tagging, we achieve the following result: “Cae /NN | tinh ningjeateres ƒNa | nói chunggencraity X | Bare (Ce | 8tyooe Aw” All the segmented and tagged sentences

are then stored in the database along with the POS tag information

C Phase 2: Product Features and Opinion Words Extraction

This phase extracts product Scalurcs and opinion words from Vielnamese customer reviews In

this phase, we consider product features being nouns or nouns phrases, and opinion words being

not only adjectives as [7] bul also verbs because apart from adjectives, sometimes Vietnamese

verbs also express opinions For example, for the sentence “T6i thich mau sắc chiếc điện thoại

nay”_(1 love the color of this phone), “mau sae(Noun phrase)” oor is a product features; and

“thích(Yerb?”;„„ is an opinion word

Thorcfure, we combine Victnamese synlax rules with tho feature extraction method proposed

by [2] to obtain Vietnamese product features In addition, we resolve some drawback points

of FOMS system which are identifying co-references in subsection I-C2, extracting implicit features from opinion words in subsection III-C3, and grouping synanym product features in

subsection IH-C4

1) kxplictt Product Meatures tixtraction: Explicit product [catures are expressed directly in the

sentences in customer reviews For example,“Màn hình cảm ứng của chiếc Iphone 4 này rất

tuydt”_(The touch screen of the Iphume 4 is yreat), Touch sereen is an cxplicit product feature

This module extracts the product features based on the three syntax rules which are part-whole

relation, “No” patterns, and double propagation rule

2) Opinion word Extraction: This module not only extracts the nearest adjectives and verbs

with identificd product feature, but extracts both sentiment strength words (gradable wurds)such

as “rif”yery and negative words such as “Ichéng”)o, as well in the sentence If adjectives are connected to each ofher by comms

adjectives and consider them as opinion words

3) Implicit Features identification: ImplieiL features arc product features not appearing directly

in sentence but via opinion words in the sentence For example, “Bién thaai nay dit qua” This

phone is too expensive, so the opinion word “EAU capensive refers to product price not expressed

s or semicolons or conjunctions, we will extract all of these

Trang 10

direclly in the sentence For the domain of “mobile phone”, we construct a mapping dictionary

to identify the implicit feature hy mapping thosc ones to corresponding opinion words

4) Grouping Synonym Features: We use two concepts in [1] Firstly, feature expression of a

feature is a word or phrase that actually appears in a review to indicate the feature Secondly,

feature group (or feature for short) is the name of a feature (given by the user} For example,

a feature group could be named “Ch&t Iwyng Anh" pyciuye guaticy’s bul there arc many possible

expressions indicating the feature, e.g., “Anh” picture, “‘hinh ảnh”; aa;;, and even the “Chất lượng

Ảnh xu», quzaays Ílsclf- AI the festare cxprcssions ïn a feature group signify the same feature

Because the customer can express on the same product feature with many different words

and phrases, for example, both “mẫu m㔄„ and “kiểu đáng”z„¡z„ are belong to “hình

thứC”zpszarzzcc group To make more useful of the summarization phase, these words or phrases, which express the same feature, need to be grouped into synonym features group [1] Our

grouping method based an the SVM-KNN semi-supervised learning | 13]|1|[14| along with HAC

clustering method generating training set for SVM-KNN Therefore, the method is unsupervised

and full automatic

5S) Frequent Features Identification: This step determines the frequent feature in reviews, and

removes redundant features To resolving this task, we compute the frequency of features

appearing on customer reviews If the frequency is greater than a given threshold, the feature

is a frequent fealure Whereas, the feature is redundant [eaturcs and il is climinated

D Phase 3: Determining the opinion orientation

Opinion orientation of each customer on each opinion feature is determined in this phase via

two following steps Firstly, the opinion weight of the customer on each feature on which the

customer expresses their opinions is determined Secondly, opinion orientation of the feature is

determined by classifying s: positive, negative or neutral

« In the first step, a initial VietSentiWordnet which is Vietnamese sentiment dictionary have

boen constructed by extending SentiWordnet 3.0 Therofore, customer's opinion weights am

product feature are calculated,

The inilial VietSeniWordNet hay 977 sentiment synsets and 1179 sentiment words has been

extended by using a semi supervised learning method [15][16] After the normalization all

of opinion words, the extending VietSentiWordNet has 9333 synsets and 9533 words

Denoting 6z as the opinion weight of the feature in a customer's review, ts; is the weight of

the #* opinion words on the feature in the review (denoted by word;); w; is opinion weight

of word; got tram VietSentiWordnet dictionary hy getting the subtraction of positive and

negative score of word: After that, ts is determined as: és = S77" isi where ra be the number

of opinion words of the feature in the review In cases of having negative ward such as

“khéng”,, ¢, the value of is; is reversed (it means that ts; = 1 x és;) In other cases, ts;

into one of three cla:

Trang 11

equals to œ¿ if there is no gradable word such as: rẤt, ry, and f¡ is determined as h x wi if

there is a gradable word with weight of h

« In the second step, opinion orientation for the feature is classified into one of three classes: positive/negative or neutral based on the weight of ts

— if +0.2 < fs so the opinion is positive

— if —0.2 < ts < +0.2 so the opinion is neutral

— if ts <—0.2 the opinion is negative

E Phase 4: Summarization

The summarization is determined by enumerating on all of customer's opinion orientation on

all of product features And the result is showed in table diagram like figure 2

Poitve negative

Vietnamese FOMS system on “mobile phone” product reviews In this chapter, we describe our

results in evaluating via two experiments which are: product features extraction and the whole system evaluations After the two experiments, we implement summarization task and show the

summarizing result in column charts

A Environment and Experimental Data

Trang 12

+ Programming Tool: Java Bclipse SDK

2) Experimental Pata: We crawl 743 customer reviews on ten popular “mobile phone” prod- ucls from website hitp:/wwwhegioididong.com Table I shows the number of crawled and standardized reviews for each product

Table 1 TOTAL OF CRAWLED REVIEWS

Product names Number of comments

Subsequently we evaluate the achievement result on feature extracting phase using Vietnamese syntax rules Table II illustrates the effectiveness of the feature extraction For cach product, we read all of these reviews and list all product features from them Then we enumerate corrected

Trang 13

fealures returned by the system The precision, recall and FI are illustrated ìn Col 2, 3 and 4 respectively Tt can be son that results of frequent foatures extraction stop are good with all values of F, above 80%

PRECISION, RECALL AND Fl OF FEATURE-BASED GHINION MINING MODEL OX VIETNAMESE MOBILE PHONES

REVTEWS]

Product names Precsion(%) RecaH(%) Fi(%)

LG GS290 Cookie Fresh | 7.12 Tĩ.T8 Tras

C The Whole System Evaluation

For cach feature extracted from the previous experiment, firstly, the system extract opinion

words from reviews mentioning to this feature in 743 crawled reviews Secondly, the system calculate opinion weigh! of the upinion words, Finally, we oblain positive, negative and neutral

comments for all features of each product According to the table ILL, the precision and recall

of our system are quile sulisfactory with both precision and recull valucs approximate 69% In

summarization task, figure 3 shows a summarization of the customer reviews on each features

of product LG Wink Touch T300

‘V CONCLUSION

In this thesis, we presented, in chapter I, an approach to build an opinion mining sys- tem of customer reviews according to product features based on Vietnamese syntax rules and VietSentiWordNet dictionary, with three main contributions as following:

» Firsily, in the phase 1, we buill a Vietnamese accented system combined N-gram slalistic

mode] and Hidden Markov model(HMM) for the purpose of converting a sentence without accents into a Vietnamese accented sentence

+ Secondly, in the phase 2, we proposed a method of using SVM-kNN semi-supervised

learning along with IJAC clustering method generating training set for SVM-KNN to group

synonym features; after that, co-reference was resolved by using some Vietnamese rules.

Ngày đăng: 21/05/2025, 18:38

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[6] B. Liu, “Sentiment analysis and subjectivity.” in Handbook of Natural Language Processing, Second Edition, N. Indurkhya and ù' 1. Damerau, lds. Boca Raton, | CRC Press, ‘Vaylor and lrancis Group, 20010, iSHIN978-1420085921 Sách, tạp chí
Tiêu đề: Handbook of Natural Language Processing
Tác giả: B. Liu
Nhà XB: CRC Press
Năm: 2010
[7] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international canference on Knowledge discovery and data mining, ser. KID "04. New York, NY, USA ACM, 2004, pp. 168-177 Sách, tạp chí
Tiêu đề: Proceedings of the tenth ACM SIGKDD international canference on Knowledge discovery and data mining
Tác giả: M. Hu, B. Liu
Nhà XB: ACM
Năm: 2004
18] V. Stoyanov and C. Cardie, “Topic identification for fine-grained opinion analysis,” in Praccedings of the 22nd International Conference on Computational Linguistics - Volume 1, set. COLING "08. Stroudsburg, PA, USA: Association [or Computation: Linguistics, 2008, pp. 817-824 Sách, tạp chí
Tiêu đề: Praccedings of the 22nd International Conference on Computational Linguistics - Volume 1
Tác giả: V. Stoyanov, C. Cardie
Nhà XB: Association for Computational Linguistics
Năm: 2008
19] V, Matzivassiiogion and K. R, McKeown, “Predicting the semantic orientation of adjectives.” in Proceedings of the eighth conference on Furnpean chapter of the Association for Computational Linguistics, ser. EACI.°97, Stroudsburg, PA, USA; Association for Computational Linguistics, 1997, pp. 174-181 Sách, tạp chí
Tiêu đề: Proceedings of the eighth conference on Furnpean chapter of the Association for Computational Linguistics
Tác giả: V. Matzivassiiogion, K. R. McKeown
Nhà XB: Association for Computational Linguistics
Năm: 1997
[13] K. Li, X. Luo, and M. Jin, “Semi-supervised learning for svm-knn,” Jearnad of Computers, vol. 5, no. 5, pp. 671.679, 2010 Sách, tạp chí
Tiêu đề: Semi-supervised learning for svm-knn
Tác giả: K. Li, X. Luo, M. Jin
Nhà XB: Jearnad of Computers
Năm: 2010
[14] H. Zhang, A. C. Berg, M. Maire, and J. Malik, “Svm-knn: Discriminalive neurcst neighbor classification for visual category recognition,” in CVPR (2), 2006, pp. 2126-2136 Sách, tạp chí
Tiêu đề: CVPR
Tác giả: H. Zhang, A. C. Berg, M. Maire, J. Malik
Năm: 2006
[15] A. Bsuli and F, Scbastiani, “Sentiwordne(: A publicly available Icaical resource for opinion mining,” in fr Proceedings of the Sth Conference on Language Resources and Evaluation (LREC’06), 2006, pp. 417-422 Sách, tạp chí
Tiêu đề: Proceedings of the Sth Conference on Language Resources and Evaluation (LREC’06)
Tác giả: A. Bsuli, F. Scbastiani
Năm: 2006
J16] A. Kuli, “Automatic: generation of lexical resources far opinion mining: models, algorithms and. appl tions.” S/GIX Forwn, vol. 42, pp. 105-106, November 2008 Sách, tạp chí
Tiêu đề: Automatic: generation of lexical resources far opinion mining: models, algorithms and. appl tions
Tác giả: A. Kuli
Nhà XB: S/GIX Forwn
Năm: 2008
[1] Z. Zhai, B. Liu, H. Xu, and P. Jia, “Grouping product features using. semi-supervised learning with soft-constraints,” in Proceedings of the 23rd International Conference on Computational Linguistics, ser.COLING "10, | Stroudsburg, PA, USA: Association for Computational Linguistics, 2010, pp. 1272-1280 Khác
[16] S.-M. Kim and E, {ovy, “Automatic identification of pro and con reasons in online reviews.” in Proceedings of the COLING/ACL on Main conference poster sessions, sct. COLING-ACI. "06. Stroudsburg, PA, USA:Association for Computational Linguislivs, 2006, pp. 283-490 Khác
[12] D. D. Pham, G. B. Tran, and S. B. Pham, “A hybrid approach to vietnamese word segmentation using part of speech tags,” Knawledge and Systems Engineering, International Conference on, vol. 0, pp. 154 161,2009 Khác

HÌNH ẢNH LIÊN QUAN

Hình  1  mô  tả  mô  hình  để  xuất  về  khai  phá  và  tổng  hợp  quan  điểm  dựa  trên  đặc  trưng  trên - Luận văn a feature based opinion mining model on product reviews in vietnamese
nh 1 mô tả mô hình để xuất về khai phá và tổng hợp quan điểm dựa trên đặc trưng trên (Trang 21)

TỪ KHÓA LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm