Hà Quang Thụy Năm bảo vệ: 2012 Abstract: In this thesis, we present an approach to build an opinion mining system of customer reviews according to product features based on Vietnamese
Trang 1Mô hình khai phá quan điểm dựa trên đặc trưng
các đánh giá sản phẩm trong tiếng Việt
Vũ Tiến Thành
Trường Đại học Công nghệ Chuyên ngành: Khoa học máy tính; Mã số: 60 48 01 Người hướng dẫn: PGS.TS Hà Quang Thụy
Năm bảo vệ: 2012
Abstract: In this thesis, we present an approach to build an opinion mining
system of customer reviews according to product features based on Vietnamese syntax rules and VietSentiWordNet dictionary in four phases: (1)Pre-processing; (2)Extracting explicit/implicit product features and opinion-words,and grouping synonym product features; (3)Identifying orientation of opinion; and (4)Summarizing the results With three main contributions as following: Firstly,
in the phase 1, we build a Vietnamese accented system combined N-gram statistic model and Hidden Markov model(HMM) for the purpose ofconverting a sentence without accents into a Vietnamese accented sentence Secondly, in the phase 2, we construct a mapping dictionary to identify implicit features by mapping those ones to corresponding opinion words; and we proposed a method
of using SVM-kNN semi-supervised learning along with HAC clustering method generating training set for SVM-kNN to group synonym features; after that, co-reference was resolved by using some Vietnamese rules
Trang 2Table of Contents
2.1 Opinion Mining 4
2.1.1 The demand of opinion mining 4
2.1.2 The basic concepts in the opinion mining field 7
2.1.3 Opinion mining problems 9
2.2 Feature-based Opinion Mining 10
2.2.1 Problem Definition 10
2.2.2 Features Extraction 11
2.2.3 Opinion Orientation Identification 12
2.2.4 Feature-based Opinion Mining System on Vietnamese Product Reviews 14
3 Our Feature-based Opinion Mining Model 15 3.1 Introduction 15
3.2 Phase 1: Pre-processing 16
3.2.1 Data Standardizing 16
3.2.2 Token Segmenting and POS Tagging 17
3.3 Phase 2: Product Features and Opinion Words Extraction 18
3.3.1 Explicit Product Features Extraction 18
3.3.2 Opinion word Extraction 21
3.3.3 Implicit Features identification 22
3.3.4 Grouping Synonym Features 23
3.3.5 Frequent Features Identification 24
3.4 Phase 3: Determining the opinion orientation 26
3.5 Phase 4: Summarization 28
vi
Trang 3TABLE OF CONTENTS vii
4.1 Environment and Experimental Data 29
4.1.1 Environment 29
4.1.2 Experimental Data 29
4.2 Product Features Extraction Evaluation 30
4.3 Opinion Words Extraction Evaluation 31
4.4 The Whole System Evaluation 32
Trang 4Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani Sentiwordnet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, and Daniel Tapias, editors, Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, may 2010 European Language Resources Association (ELRA) ISBN 2-9517408-6-7
Giuseppe Carenini, Raymond T Ng, and Ed Zwart Extracting knowledge from evaluative text
In K-CAP, pages 11–18, 2005
Amitava Das and Sivaji Bandyopadhyay Sentiwordnet for indian languages In Proceedings of The 8th Workshop on Asian Language Resources, pages 56—-63, 2010
Andrea Esuli Automatic generation of lexical resources for opinion mining: models, algorithms and applications SIGIR Forum, 42:105–106, November 2008 ISSN 0163-5840
Andrea Esuli and Fabrizio Sebastiani Sentiwordnet: A publicly available lexical resource for opinion mining In In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC’06), pages 417–422, 2006
Quang-Thuy Ha, Tien-Thanh Vu, Huyen-Trang Pham, and Cong-To Luu An upgrading feature-based opinion mining model on vietnamese product reviews In Proceedings of the 7th interna-tional conference on Active media technology, AMT’11, pages 173–185, Berlin, Heidelberg, 2011 Springer-Verlag ISBN 978-3-642-23619-8
Vasileios Hatzivassiloglou and Kathleen R McKeown Predicting the semantic orientation of ad-jectives In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics, EACL ’97, pages 174–181, Stroudsburg, PA, USA, 1997 Associa-tion for ComputaAssocia-tional Linguistics
Chih-Wei Hsu and Chih-Jen Lin A comparison of methods for multiclass support vector machines IEEE Transactions on Neural Networks, 13(2):415–425, 2002 URLhttp://ieeexplore.ieee org/xpls/abs_all.jsp?arnumber=991427&isnumber=21380
38
Trang 5Bibliography 39
Minqing Hu and Bing Liu Mining and summarizing customer reviews In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’04, pages 168–177, New York, NY, USA, 2004 ACM ISBN 1-58113-888-1
Pham Huyen-Trang, Vu Tien-Thanh, Tran Mai-Vu, and Ha Quang-Thuy A solution for grouping vietnamese synonym feature words in product reviews In Proceedings of the APSCC 2011 conference, inpress, Korea, 2011
Binh Thanh Kieu and Son Bao Pham Sentiment analysis for vietnamese In Proceedings of the
2010 Second International Conference on Knowledge and Systems Engineering, KSE ’10, pages 152–157, Washington, DC, USA, 2010 IEEE Computer Society ISBN 978-0-7695-4213-3 Soo-Min Kim and Eduard Hovy Automatic identification of pro and con reasons in online reviews
In Proceedings of the COLING/ACL on Main conference poster sessions, COLING-ACL ’06, pages 483–490, Stroudsburg, PA, USA, 2006 Association for Computational Linguistics Kunlun Li, Xuerong Luo, and Ming Jin Semi-supervised learning for svm-knn Journal of Com-puters, 5(5):671–679, 2010
Bing Liu Sentiment analysis and subjectivity In Nitin Indurkhya and Fred J Damerau, edi-tors, Handbook of Natural Language Processing, Second Edition CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010 ISBN 978-1420085921
Bruno Ohana Opinion mining with the SentWordNet lexical resource PhD thesis, 2009
Bo Pang and Lillian Lee Opinion mining and sentiment analysis Found Trends Inf Retr., 2: 1–135, January 2008 ISSN 1554-0669 doi: 10.1561/1500000011 URL http://dl.acm.org/ citation.cfm?id=1454711.1454712
Dang Duc Pham, Giang Binh Tran, and Son Bao Pham A hybrid approach to vietnamese word segmentation using part of speech tags Knowledge and Systems Engineering, International Conference on, 0:154–161, 2009
Ana-Maria Popescu and Oren Etzioni Extracting product features and opinions from reviews In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, HLT ’05, pages 339–346, Stroudsburg, PA, USA, 2005 Association for Computational Linguistics
Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen Expanding domain sentiment lexicon through double propagation In Proceedings of the 21st international jont conference on Artifical intelli-gence, IJCAI’09, pages 1199–1204, San Francisco, CA, USA, 2009 Morgan Kaufmann Publishers Inc
Trang 6Bibliography 40
Christopher Scaffidi, Kevin Bierhoff, Eric Chang, Mikhael Felker, Herman Ng, and Chun Jin Red opal: product-feature scoring from reviews In Proceedings of the 8th ACM conference on Electronic commerce, EC ’07, pages 182–191, New York, NY, USA, 2007 ACM ISBN 978-1-59593-653-0
Veselin Stoyanov and Claire Cardie Topic identification for fine-grained opinion analysis In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING ’08, pages 817–824, Stroudsburg, PA, USA, 2008 Association for Computational Lin-guistics ISBN 978-1-905593-44-6
Mike Thelwall Myspace comments Online Information Review, 33(1):58–76, 2009
Peter D Turney Thumbs up or thumbs down? semantic orientation applied to unsupervised classi-fication of reviews Computational Linguistics, pages(July):8, 2002 URLhttp://cogprints org/2321/
Peter D Turney and Michael L Littman Measuring praise and criticism: Inference of semantic orientation from association ACM Trans Inf Syst., 21:315–346, October 2003 ISSN 1046-8188 Tien-Thanh Vu, Huyen-Trang Pham, Cong-To Luu, and Quang-Thuy Ha A feature-based opin-ion mining model on product reviews in vietnamese In Radoslaw Katarzyniak, Tzu-Fu Chiu, Chao-Fu Hong, and Ngoc Nguyen, editors, Semantic Methods for Knowledge Management and Communication, volume 381 of Studies in Computational Intelligence, pages 23–33 Springer Berlin Heidelberg, 2011 ISBN 978-3-642-23417-0
Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia Grouping product features using semi-supervised learning with soft-constraints In Proceedings of the 23rd International Conference on Compu-tational Linguistics, COLING ’10, pages 1272–1280, Stroudsburg, PA, USA, 2010 Association for Computational Linguistics
Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia Clustering product features for opinion mining
In WSDM’11, pages 347–354, 2011a
Zhongwu Zhai, Bing Liu, Hua Xu, and Peifa Jia Constrained lda for grouping product features
in opinion mining In Joshua Huang, Longbing Cao, and Jaideep Srivastava, editors, Advances
in Knowledge Discovery and Data Mining, volume 6634 of Lecture Notes in Computer Science, pages 448–459 Springer Berlin / Heidelberg, 2011b ISBN 978-3-642-20840-9
Hao Zhang, Alexander C Berg, Michael Maire, and Jitendra Malik Svm-knn: Discriminative nearest neighbor classification for visual category recognition In CVPR (2), pages 2126–2136, 2006
Lei Zhang, Bing Liu, Suk Hwan Lim, and Eamonn O’Brien-Strain Extracting and ranking product features in opinion documents In Proceedings of the 23rd International Conference on Com-putational Linguistics: Posters, COLING ’10, pages 1462–1470, Stroudsburg, PA, USA, 2010 Association for Computational Linguistics