1. Trang chủ
  2. » Luận Văn - Báo Cáo

Aspect based sentiment analysis for text documents

93 19 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Aspect Based Sentiment Analysis For Text Documents
Tác giả Trần Cống Toàn Trí, Nguyễn Phú Thiện
Người hướng dẫn TS. Lê Thanh Văn
Trường học Ho Chi Minh City National University
Chuyên ngành Computer Science
Thể loại graduate thesis
Năm xuất bản 2021
Thành phố Ho Chi Minh
Định dạng
Số trang 93
Dung lượng 2,35 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

33 4.1 Aspect distribution of the VLSP-2018 ABSA dataset, hotel domain.. 37 4.2 Aspect distribution of the VLSP-2018 ABSA dataset, restaurant domain.. 37 4.3 Aspect distribution of the U

Trang 1

FACULTY OF COMPUTER SCIENCE AND ENGINEERING

GRADUATE THESIS

ASPECT BASED SENTIMENT ANALYSIS

FOR TEXT DOCUMENTS Major: Computer Science

Council: KHMT 5 Supervisor: Dr Le Thanh Van Examiner: Dr Nguyen Quang Hung Students: Tran Cong Toan Tri 1713657

Ho Chi Minh, December 2021

Trang 2

KHOA:KH & KT Máy tính NHIỆM VỤ LUẬN ÁN TỐT NGHIỆP

BỘ MÔN:Hệ thống & Mạng Chú ý: Sinh viên phải dán tờ này vào trang nhất của bản thuyết trình

NGÀNH: KHOA HỌC MÁY TÍNH LỚP: MTKH03

1 Đầu đề luận án:

Phân tích cảm xúc theo khía cạnh từ dữ liệu văn bản

Aspect based sentiment analysis for text documents

2 Nhiệm vụ (yêu cầu về nội dung và số liệu ban đầu):

- Tìm hiểu về đặc điểm của bài toán phân tích cảm xúc theo khía cạnh từ dữ liệu văn bản

- Tìm hiểu các công trình liên quan

- Nghiên cứu và đề xuất mô hình có thể nhận biết được khía cạnh và cảm xúc của khía cạnh từ

dữ liệu văn bản

- Thu thập dữ liệu để huấn luyện và kiểm thử mô hình

- Hiện thực mô hình đề xuất, thực nghiệm, so sánh và đánh giá

3 Ngày giao nhiệm vụ luận án: 30/08/2021

4 Ngày hoàn thành nhiệm vụ: 31/12/2021

5 Họ tên giảng viên hướng dẫn: Phần hướng dẫn: 100%

1) TS Lê Thanh Vân Nội dung và yêu cầu LVTN đã được thông qua Bộ môn

Ngày tháng năm

(Ký và ghi rõ họ tên) (Ký và ghi rõ họ tên)

Lê Thanh Vân

PHẦN DÀNH CHO KHOA, BỘ MÔN:

Người duyệt (chấm sơ bộ):

Trang 3

Ngày 27 tháng 12 năm 2021

PHIẾU CHẤM BẢO VỆ LVTN

(Dành cho người hướng dẫn/phản biện)

1 Họ và tên SV: Trần Công Toàn Trí, Nguyễn Phú Thiện

2 Đề tài: Phân tích cảm xúc theo khía cạnh từ dữ liệu văn bản

3 Họ tên người hướng dẫn/phản biện: TS Lê Thanh Vân

4 Tổng quát về bản thuyết minh:

6 Những ưu điểm chính của LVTN:

Luận văn hướng đến việc đề xuất mô hình phân tích cảm xúc theo khía cạnh từ dữ liệu văn bản Để đạt được mục tiêu của đề tài, nhóm sinh viên đã thực hiện tốt những việc sau:

-! Tìm hiểu các đặc điểm của bài toán phân tích cảm xúc nói chung và theo khía cạnh nói riêng

từ dữ liệu văn bản

-! Tìm hiểu các công trình nghiên cứu liên quan nổi bật trong những năm gần đây

-! Chủ động liên hệ các nhóm nghiên cứu VLSP và UIT để thu thập các tập dữ liệu mẫu để xây dựng tập dữ liệu huấn luyện và kiểm thử Xây dựng công cụ crawler để thu thập dữ liệu từ trang booking.com để có dữ liệu thực tế phục vụ đánh giá mô hình đề xuất

-! Tìm hiểu và phân tích tốt ưu điểm của các mô hình xử lý ngôn ngữ tự nhiên như BERT, PhoBert, các mô hình về tích hợp các lớp tiềm ẩn và mô hình phân loại phân cấp theo entity, aspect và sentiment, mô hình dựng câu bổ trợ dựa trên Bert

-! Ứng dụng mô hình NLI-B dựng câu bổ trợ dựa trên PhoBert cho ngôn ngữ tiếng Việt

-! Đề xuất mô hình HSUM-HC là sự kết hợp và tận dụng tốt các ưu điểm của PhoBert, các lớp

ẩn của mô hình để tăng khả năng nhận biết ngữ nghĩa và tích hợp với mô hình phân loại phân cấp

-! Đánh giá thực nghiệm NLI-B, HSUM-HC 4 lớp tiềm ẩn và HSUM-HC 8 lớp tiềm ẩn với 3 tập dữ liệu VLSP, UIT và Booking.com cho 2 miền dữ liệu là nhà hàng và khách sạn Thực nghiệm cho kết quả tốt hơn ở đa số trường hợp khi so sánh với Linear SVM, Multilayer Perceptron, CNN, BiLSTM+CNN, PhoBert và viBErt Thêm vào đó, mô hình có các độ đo đánh giá cao khi dữ liệu biểu diễn ở mức document vì nhận biết tốt sự liên kết về mặt ngữ nghĩa giữa các câu

Ngoài ra, khi nhận thấy mô hình mang lại kết quả đánh giá tốt, trong giai đoạn cuối thực hiện, luận văn đã được đề xuất bổ sung thêm một ứng dụng đơn giản hỗ trợ tìm kiếm cho phép người dùng nhập vào một yêu cầu bất kì về khách sạn mà không theo tiêu chí định trước như các trang đặt phòng hiện tại Ứng dụng sẽ phân tích yêu cầu, nhận biết yêu cầu và trả ra kết quả các

Trang 4

Bert-based hidden aggregation to hierarchical classifier for Vietnamese aspect-based sentiment analysis” được chấp thuận và đã trình bày tại hội nghị IEEE, 2021 8th NAFOSTED

Conference on Information and Computer Science (NICS) vào ngày 21/12/2021, Hà Nội, Việt Nam

7 Những thiếu sót chính của LVTN:

Một số câu trong báo cáo luận văn quá dài, nên tách ý ra để rõ nghĩa và đọc dễ hiểu hơn

9 3 câu hỏi SV phải trả lời trước Hội đồng:

a

b

c

Ký tên (ghi rõ họ tên)

Lê Thanh Vân

Trang 5

OUUX<"3935526 Pi pj"*ejw{‒p"pi pj+<"Mjqc"j丑e"o {"v pj

40"A隠"v k<"Rj¤p"v ej"e違o"z¿e"vjgq"mj c"e衣pj"v瑛"f英"nk羽w"x<p"d違p"*Curgev"dcugf"ugpvkogpv"cpcn{uku"hqt"vgzvfqewogpvu+

50"J丑"v‒p"pi逢運k j逢噂pi"f磯p1rj違p"dk羽p<"VU0"Piw{宇p"Swcpi J́pi

90"Pj英pi"vjk院w"u„v"ej pj"e栄c"NXVP<

/ O» j·pj RjqDgtv n 8« e„ u鰯p mj p鰻k vk院pi v瑛 m院v sw違 pijk‒p e泳w e栄c XkpCK0 Nw壱p x<p f瑛pi n衣k 荏 xk羽e z¤{f詠pi e e n噂r vt‒p e栄c RtqDgtv 8吋 v衣q tc o» j·pj o噂k JUWO/JE rj́ j嬰r x噂k 8隠 v k nw壱p x<p *d k vq pCDUC+0 E»pi xk羽e p { 荏 o泳e 8瓜 nw壱p x<p A衣k j丑e jq p vq p ej医r pj壱p 8逢嬰e0 Nw壱p x<p j挨k vjk‒p x隠 f衣pi

n o"8隠"v k"pijk‒p"e泳w"j挨p"8隠"v k"nw壱p"x<p"v嘘v"pijk羽r0

/ Mjk 8逢c o» j·pj JUWO/JE x q d k vq p vj詠e v院 VtcxgnNkpm *f英 nk羽w pj壱p zfiv v瑛 Dqqmkpi0eqo+ vj· e ppjk隠w j衣p ej院0 Vtcpi ygd ikcq fk羽p VtcxgnNkpm sw u挨 u k vjk院w e e rj¤p v ej {‒w e亥w v瑛 rj c pi逢運k f́pi8院p vjk院v m院 j羽 vj嘘pi0 X医p 8隠 mk吋o vj穎 rj亥p o隠o ej逢c 8衣v0 X f映< Pj„o ukpj xk‒p ej雨 f瑛pi n衣k 荏 xk羽e n医{ #mk院p pj壱p zfiv x ej医o 8k吋o x噂k o瓜v u嘘 d衣p ukpj xk‒p e栄c v e ik違 *p‒p m院v sw違 ejq mj v嘘v+0 Xk羽e u逸r z院r m院vsw違"v·o"mk院o"ej逢c"vj詠e"u詠"j嬰r"n#0"E e"o磯w"e¤w"vtw{"x医p"e p"j衣p"ej院"ej逢c"rj́"j嬰r"vj詠e"v院0

:0"A隠"pij鵜<"A逢嬰e"d違q"x羽 D鰻"uwpi"vj‒o"8吋"d違q"x羽 Mj»pi"8逢嬰e"d違q"x羽

;0"5"e¤w"j臼k"UX"rj違k"vt違"n運k"vt逢噂e"J瓜k"8欝pi<

c0"Ukpj"xk‒p"j«{"n#"ik違k"x·"ucq"f詠c"vt‒p"RjqDgtv."p院w"mj»pi"e„"RjqDgtv"vj·"u胤"違pj"j逢荏pi"m院v"sw違"nw壱p"x<ppj逢"vj院"p qA

d0"Uq"u pj"m院v"sw違"v·o"mk院o"f詠c"vt‒p"eqoogpv"e栄c"nw壱p"x<p"vt‒p"VtcxgnNkpm"x噂k"m院v"sw違"v·o"mk院o"f詠c"vt‒p"v瑛mjq "x "e„"zfiv"{院w"v嘘"mjq違pi"e ej"e„"mj e"pj逢"vj院"p qA"X "f映<"sw p"<p"piqp"i亥p"vt逢運pi"AJDM"x "e„"ej厩8壱w"zg0

320"A pj"ik "ejwpi"*d茨pi"ej英<"ik臼k."mj "VD+<"Ik臼k Ak吋o"<"""""""";.7"1"32

M#"v‒p"*ijk"t "j丑"v‒p+

Trang 7

We, Tran Cong Toan Tri and Nguyen Phu Thien, declare that this thesis titled timent Analysis of User Comments" and the work presented in it are our own and that,

"Sen-to the best of our knowledge and belief, it contains no material previously published orwritten by another person (except where explicitly defined in the acknowledgments), normaterial which to a substantial extent has been submitted for the award of any otherdegree or diploma of a university or other institution of higher learning

Trang 8

It gives us great pleasure and satisfaction in presenting our thesis on “Aspect basedsentiment analysis for text documents” This would be our last project as bachelorstudents in university, this project reflects what we have learned and the skills we acquireduring the years at the University of Technology.

For that reason, we would like to express our deepest sense of gratitude towards ourguidance teacher, Dr Le Thanh Van for allowing us to work on this project, for hercontinuous support throughout our study and research, and for the amount of patience,motivation, and knowledge that she has given us We could not have imagined a betteradvisor and mentor for our thesis

Besides our advisor, we would like to say thank you for all the knowledge, experience,and support of the teachers and staff of the Computer Science and Engineering Facultythat has been given to us in our time at university, their teachings have helped us acquirethe foundational knowledge for this thesis

Lastly, we would like to thank all of our friends and family who have motivated us everystep of the way, we would not have been able to finish this without them, and we aregrateful for everything we have been given till this day

Trang 9

In today’s world, customers and their feedback are vital to any business’s survival.Competition is harsh on every market, the competitor who understands and pleases theircustomer the most will be more successful For that to happen, businesses need to gatherinformation about their customers’ opinions on a large scale ABSA is a method for them

to achieve this, it has been studied rigorously by researchers in the past, and since thecreation of Bert, ABSA methods are getting more and more advance, showing better andbetter results in recent years However for Vietnamese, ABSA is still not as developed,due to the limited resources and the nuances of the language

In our work, we want to improve the capabilities of Vietnamese ABSA, we use theVietnamese SOTA pre-trained PhoBert and built two models from it One has a customclassifier, made from a combination of previous methods that proved effective, and theother is made to utilize Bert’s sequence pair feature, constructing auxiliary sentences andturning ABSA into a question-answering problem With our work, we hope to set newbaseline results for the Vietnamese ABSA datasets, along with providing useful knowledgefor any researchers who want to improve it further Our implementation achieved SOTAscores for both public datasets on Vietnamese ABSA, getting considerably higher scoresthan previous works

To demonstrate that our model not only works on filtered data but also actual userreviews, we also obtained reviews from a booking site and use our model on them Fromthat data, we made a profile for each hotel, finding their pros and cons, then we built

a search engine to help users in booking their accommodation by providing immediateaccess to necessary information In this work, we will provide the acquisition of these data,their evaluation results, and our process of designing the search engine

Trang 10

1 Introduction 1

1.1 Why we chose this project 1

1.2 Project goal 2

1.3 Project scope 3

1.4 Project structure 3

2 Aspect Based Sentiment Analysis 5 2.1 What is ABSA 5

2.2 ABSA research overview 6

2.3 Vietnamese ABSA shared task 7

2.4 Related work 10

3 Our proposed models 27 3.1 Bert sequence-pair with auxiliary sentences 27

3.2 HSUM-HC 32

4 Experimental results and discussion 35 4.1 Exploratory data analysis 35

4.2 Training process 39

4.3 Training cost 40

4.4 Experimental Results 42

4.5 Analysis and Discussion 45

4.6 Evaluation on real-life data 49

4.7 Survey results 51

5 Model application for a recommender system 53 5.1 Inspiration 53

5.2 Overview 54

5.3 Technology 54

5.4 Design 56

Trang 11

6 Conclusion 65

6.1 Our contribution 65

6.2 Research Paper 66

6.3 Limitations 66

6.4 Future work 67

Trang 12

2.1 Possible entity-attribute pairs for Hotel domain 9

2.2 Possible entity-attribute pairs for Restaurant domain 9

3.1 Translation for Hotel domain entities 31

3.2 Translation for Hotel domain attributes 31

3.3 Translation for Restaurant domain entities 31

3.4 Translation for Restaurant domain attributes 31

3.5 Translation for Sentiments 32

4.1 Dataset details for VLSP 2018 ABSA 35

4.2 Dataset details for UIT ABSA 36

4.3 Training paramters for HSUM-HC and NLI_B 41

4.4 Training costs 41

4.5 Results on the test set of VLSP 2018 Dataset, Hotel domain 42

4.6 Results on the test set of UIT ABSA Dataset, Hotel domain 43

4.7 Results on the test set of VLSP 2018 Dataset, Restaurant domain 43

4.8 Results on the test set of UIT ABSA Dataset, Restaurant domain 44

4.9 Real-life dataset details 50

Trang 13

2.1 Paperswithcode’s summary on SemEval-2014 ABSA researches 7

2.2 Example of a review and expected labels 8

2.3 Multitask BiLSTM-CNN model for ABSA 12

2.4 The architecture of intra attention 14

2.5 The architecture of global attention 15

2.6 BERT input representation [1] 16

2.7 The architecture of Bert Encoder layer 17

2.8 Example of word segmentation 18

2.9 Thin et al Bert implementation 20

2.10 Hierarchical Hidden level aggregation for Bert 24

2.11 Hierarchical approach for a Bert-based ABSA task 26

3.1 QA_M auxiliary sentence format 28

3.2 NLI_M auxiliary sentence format 28

3.3 QA_B auxiliary sentence format 29

3.4 NLI_B auxiliary sentence format 29

3.5 HSUM-HC model for the ABSA task 33

4.1 Aspect distribution of the VLSP-2018 ABSA dataset, hotel domain 37

4.2 Aspect distribution of the VLSP-2018 ABSA dataset, restaurant domain 37 4.3 Aspect distribution of the UIT ABSA dataset, hotel domain 38

4.4 Aspect distribution of the UIT ABSA dataset, restaurant domain 38

4.5 Sentiment distribution for the VLSP-2018 dataset hotel domain (left) and restaurant domain (right) 39

4.6 Sentiment distribution for the UIT ABSA dataset hotel domain (left) and restaurant domain (right) 39

4.7 The loss curves on the validation and test sets for VLSP 2018 (left) and UIT ABSA dataset (right), Hotel domain 46

4.8 The Phase B validation curves on the validation and test sets for VLSP 2018 (left) and UIT ABSA dataset (right), Hotel domain 46

Trang 14

4.10 The Phase B validation curves on the validation and test sets for VLSP

2018 (left) and UIT ABSA dataset (right), Restaurant domain 47

4.11 HSUM-HC and NLI_B F1 score differences for VLSP-2018 hotel domain 48 4.12 HSUM-HC and NLI_B F1 score differences for UIT ABSA hotel domain 49 4.13 Survey results 52

5.1 Score calculation for each hotel 57

5.2 The search bar 59

5.3 A hotel in the results 59

5.4 The comment window when opened 60

5.5 Travel Link result page 60

5.6 Travel Link result page with comment window 61

5.7 Top 4 recommended hotels for the query chỗ ở gần trung tâm, nhân viên thân thiện, phòng rộng 62

5.8 Comments of the top recommended hotel 63

5.9 Comments of the second recommended hotel 63

Trang 15

ABSA Aspect Based Sentiment Analysis 1–14, 19–27, 30, 32, 33, 35–41, 43–47, 49, 65–67NLP Natural Language Processing 3, 6, 10, 11, 15, 18, 66, 67

SOTA State Of The Art 8, 11, 22, 25, 44, 65

UIT University of Information Technology 3, 7, 9, 11–13, 20, 36, 38–41, 43–47, 49, 65VLSP Association for Vietnamese Language and Speech Processing 3, 7, 10–13, 19, 20,

25, 35–37, 39–48, 51, 65

Trang 16

Chapter 1

Introduction

In this chapter, we will give a brief introduction about the project, the reason we chosethis project, the goal and scope of our project in real-life usage

Nowadays, with the development of the Internet and eCommerce, shopping is not assimple as picking what you want in a store anymore Shopping now can be done at home,through phones, computers, And with online shopping, customers can’t try out theproduct before they buy, nor can they feel the material or quality of the product, especially

in today’s situation, when Covid-19 is plaguing many countries and forcing people to stayindoors, the only way to judge a product before buying is from past customers’ experience.Almost every ecommerce application has a function to let customers leave their opinions onthe service they received Not only that, with the growing popularity of review websitesfor every domain possible, to name some: Foody.vn for restaurant reviews, agoda.com,booking.com, mytour.vn for hotel reviews, tinhte.vn for tech reviews, , the majority ofcustomer are likely using them to look up reviews for any product or service they areplanning to purchase, even when they plan to go shopping in person Any customer’sopinion can be read by everyone very quickly A business can lose a large portion oftheir customer to a bad review on the internet without even knowing about it Therefore,learning about customers’ opinions is one of the top priorities for any business if they want

to succeed, a company must ensure they are always aware of the general opinions Doingthat not only allows them to have a better overview of their market growth, but also knowout what they can improve Such a system to help them analyze customer opinions asdetailed as possible is Aspect Based Sentiment Analysis (ABSA), with this, an opinionwritten in text can be classified into labels, and not only can we learn what the opinion isabout, we can also learn its sentiment (positive, neutral or negative), this combined with

Trang 17

the abundance of reviews and opinions online can be priceless to any business, helpingthem get the most accurate view of their customer base.

With every decision a business makes, they always have to track their customer’sresponses and make changes accordingly, this can keep them from having a marketingcatastrophe On occasions, user opinions can circulate quickly on the internet, showing

up on front pages and are seemingly shared by many people However, whether or not acompany should make changes according to this opinion is another problem, because itcan be a "loud minority" In which listening to this loud crowd will actually dissatisfy themajority of their customer base This kind of decision can only be made with sufficientinformation, and enough coverage for customer opinions, this is called "Brand Monitoring"and is used by every big brand names, it is the process of tracking different channels toidentify where their brand is mentioned and understand how people perceive it, it letsthem keep an eye on potential crises and respond to questions or criticism before they getout of control

Not only in the service business but also in any field that needs the general public’sendorsement to succeed can benefit from learning their customers Politics is a primeexample, almost every government in recent years employs a system to get the generalopinions, especially in presidential elections, they need to gauge the public opinion andact accordingly, nowadays most of these opinions are online and in large quantities, toomuch for any human to sort through So the application of ABSA in this field is absolutelynecessary, ABSA can help a government gain significant advantages against their opponentjust by knowing what the public wants and making the right statements

With all these potential fields for application, ABSA is very useful for anyone who canapply it The English ABSA system has been extensively developed and applied to real-life usage However, for Vietnamese, a less-resources language, research, and developmentare still required to get ABSA to the point of effective commercial use That is why forour work, we focus on improving past works and developing a more effective system tohandle the Vietnamese ABSA tasks

Trang 18

with applying Transformer models for ABSA, and maximizing the potential of PhoBert

on a monolingual dataset We also experiment with utilizing PhoBert’s sequence-pair,building auxiliary sentences for each review, and treating ABSA like a question-answeringtask, with the hope of capturing better aspect-sentiment relationships in each sequence

In our project, we study the work in past studies, learning their methodology andadvantages, from that knowledge we develop our own method, improving from previousmodels Our method is a combination of components made specifically to improve theperformance of Bert Not only did we test our work with public datasets, but we alsoperform evaluations on real-life data crawled from review sites, with the purpose of seeinghow well our model can handle unfiltered data

For our thesis, we also demonstrate our model’s potential by using it for a dation system on the hotel domain, in which we use our model to build hotel profilesfrom past customer reviews, and given a query from a user, we will analyze the aspectsand sentiments of that review and suggest them suitable hotels Our system also makes itconvenient for users to view past reviews by sorting the reviews by relevance to the query,making sure our users always see relevant information first

In our work, we will build a system to solve the ABSA task of classifying user reviewsinto aspects and sentiments The datasets we have used for training and evaluating arepublic datasets from the Association for Vietnamese Language and Speech Processing(VLSP) and the University of Information Technology Natural Language Processing club(UIT NLP group)

The data we use for training and evaluation will be in the Vietnamese language Ourmodel will be expected to classify Vietnamese text, with proper accent marks and clearlyexpressed ideas, any text that is possible for a human to interpret without having priorknowledge about slang or abbreviations

Our thesis will include 7 chapters, including this one The content of each is as follow:

1 Chapter 1: Introduction In this chapter we will give an introduction to ourproject, providing more insights for the reason we chose this project, the goal andscope of our project, and our general direction with this project

Trang 19

2 Chapter 2: Aspect based sentiment analysis In this chapter, we will go intodetail about our task - ABSA We will explain in detail what ABSA and its goal

is We also present past works done on this task along with their methodology andresults The dataset we will use for training and evaluating will also be introduced

in this chapter

3 Chapter 3: Model Architecture In this chapter we will introduce our model,the inspiration of our model, and its architecture, we will explain in more detailhow it functions and what each component of the system does

4 Chapter 4: Experimental results and discussion In this chapter, we willpresent our experimental setups and results We will present our results, comparingthem with past works on the same dataset We also present our real-life crawleddataset from review sites and evaluate our model on that

5 Chapter 5: Model Application In this chapter we will apply our model to areal-life task, making an application that serves as a recommendation system forhotel booking We will explain our goals for this app and in detail how it works

6 Chapter 6: Conclusion In this chapter, we will summarize our results, ourproject’s pros and cons We will also talk about possible developments to thisproject in the future

Trang 20

or review can be represented by dozens or hundreds of words about multiple aspects withdifferent sentiments to each, and determining which sentiment words go with which aspectcan be very difficult With ABSA, reviews about a product can now be analyzed in detail,showing the reviewer’s opinion on each aspect of that product.

The main process of ABSA is as follows: Given a customer review about a domain(e.g hotel or restaurant), the goal is to identify sets of (Aspect, Polarity) that fit theopinion mentioned in the review Each aspect is a set of an entity and an attribute, andpolarity consists of negative, neutral, and positive sentiment For each domain, all possiblecombinations of entities and attributes are predefined The ABSA task will be dividedinto two phases: (i) identify pairs of entities and attribute, (ii) analyze the sentimentpolarity to the corresponding aspect (entity#attribute) identified in the previous phase.For example, a review This place has an amazing view, the food is great too but the service

is bad after phase (i) the entities and attributes pairs will be {Hotel#Design&Features,Food&Drinks#Quality, Service#General} and after phase (ii) the process output will

be Hotel#Design&Features: Positive, Food&Drinks#Quality: Positive, Service#General:Negative

Trang 21

2.2 ABSA research overview

ABSA task for English have been researched and developed extensively, especially sincethe SemEval-2014[3] and SemEval-2016[4] campaigns Almost every paper done aboutABSA released since then have experimented on these two datasets With methods rangingfrom traditional Machine Learning in earlier years, to various implementation of DeepLearning models, to Transformers in recent years with the creation of the self-attentionmechanism[5] Figure 2.1 contains Paperwithcode [6]’s summary of numerous methodsused for the SemEval 2014 ABSA dataset At the start, most top studies are with LSTM

or neural networks with an attention mechanism However, since Bert proved itself to becapable of handling NLP tasks effectively, researchers started to experiment with it andcould obtain higher scores for the task

For Vietnamese, a less-resource language, there has been less research on the ABSAtask compared to English, many factors can contribute to this:

• Firstly, Vietnamese is a more complicated language, with accent marks For almostany word in Vietnamese, changing the accent marks can give it a whole differentmeaning Therefore, words are not just plain characters put next to each other, butthere are also character-level accent marks (a, ă, â) and word-level accent mark (ngủ,ngũ, ngụ), just this alone adds a whole layer of complexity to the language Besidesaccent marks, there are also compound words, which means groups of words thatneed to be together to have a meaning or words that carry a whole new meaningwhen being next to other words For this reason, to have a model that works wellfor Vietnamese requires a large amount of data and an effective way to handle thecomplexity of the language

• Secondly, the demand for ABSA is less compared to English-speaking countries.The popularity of social media and review websites is still quite recent, with theinfrastructure and internet development being slower Also, Twitter, a website that

is mainly based on sharing users’ written opinions across a large audience, is not asprominent in Viet Nam With Twitter, opinions are being shared widely and quickly,people’s opinions can swing one way or another at any time, capturing the rightopportunity is vital That is why the popularity of Twitter is a huge contribution

to the demand of ABSA

In recent years, with development in Vietnamese NLP techniques along with socialmedia development, with people’s personal opinions mattering more and more, peoplestarted to realize that their voice can be heard by the whole community and not justtheir friends or family, so along with this comes the need for businesses to listen andunderstand the voices of their customer, leading to ABSA becoming more relevant than

Trang 22

ever with many datasets and studies released Some of these datasets include: VLSP-2018ABSA, UIT-ABSA in 2021, which will be talked about in more details in the followingsections, UIT-VSFC[7] in 2018, UIT-VSMEC[8] in 2019 These datasets have been madeavailable to all researchers and many studies were submitted, each improving the potentialand capabilities of Vietnamese ABSA a little more.

Figure 2.1: Paperswithcode’s summary on SemEval-2014 ABSA researches

to real-life use is that it only labels each review with a polarity, while a real user reviewcan be talking about many different things with different sentiments

VLSP’s second campaign in 2018 addressed this problem and released a shared task

on Aspect Based Sentiment Analysis (ABSA)[9], which comprises of two datasets thehotel and restaurant domains These datasets have document-level reviews crawled fromlozi.vn1

and booking.vn2

Given a customer review, the goal is to output the aspects andsentiments of that review The first step is to identify what the reviewer is writing about,this is done by identifying which aspects appear in the review, an aspect is a pair of an

1

https://lozi.vn

2

https://booking.vn

Trang 23

entity and an attribute, formatted as entity#attribute Entities can be understood as theobject being talked about, such as as a room or a hotel, and attributes go into specifics,such as the comfortableness of a room or the price of a hotel, this forms an aspect Afteridentifying if an aspect exists in a review, the next step is to identify the polarity, thismeans what the user feels about that aspect, polarity can be positive”, “negative” or

“neutral”, neutral is used when an aspect is mentioned but there is nothing good or badabout it Figure 2.2, shows two example reviews and expected outputs in the hotel andrestaurant domains

Figure 2.2: Example of a review and expected labels

The annotation for this dataset was done manually All possible combinations of entityand attribute is shown in Table 2.1 and 2.2 for two domains There is a total of 34 aspectsfor the hotel and 12 aspects for the restaurant domain Each dataset is split into training,validation, and testing sets The training and validation sets are used in the trainingprocess and the testing set is used for final evaluation

Many studies have been conducted on these datasets, even after the campaign ended,and new methods were discovered and applied to boost the highest score a little bit highereach time Some of these works will be talked about in detail in the following sections

Trang 24

Table 2.1: Possible entity-attribute pairs for Hotel domain

Table 2.2: Possible entity-attribute pairs for Restaurant domain

Trang 25

and foody4

Reviews are split into sentences and manually annotated Ambiguous reviewslabels are discussed and decided under the supervision of an expert to ensure accuracy.Since it is originally document-level reviews split in to sentence-level, there are sentenceswhich make less sense without its context, for example: Cũng rất nhiệt tình và vui vẻ.(Also very enthusiastic and funny), these will be labelled according to the experience ofthe annotator The aspects and sentiments of each review is similar to the VLSP dataset

• One study used Multi-layer perceptron (MLP) by Tuan et al [12] treated the lem as a multi-label classification problem So with 3 sentiments and 12 aspects forthe restaurant domain, 34 aspects for the hotel domain, they have 36 and 102 classesfor the restaurant and hotel domain, respectively Their features are constructed us-ing N-gram and TF-IDF Feeding those through a multi-label MLP gives them thepredictions

prob-• A Convolutional Neural Network (CNN) implementation by Thin et al [13] couldachieve a relatively high F1 score on Phase A of both the restaurant and hoteldataset, their work was done by first preprocessing the text data, removing redun-dant parts, and using regex to correct typos and abbreviations, then text data wasconverted to an embedding matrix as input for the CNN

These studies applied fundamental methods to the ABSA task and their results areused as a baseline for any future researches They still relied heavily on feature engineering,which will be alleviated with future improvements to NLP methods

4

https://www.foody.vn

Trang 26

2.4.2 Multitask Learning for ABSA

One of the first big jump of the VLSP 2018 SOTA score was Thin et al.[14]’s work onimplementing an end-to-end model for the ABSA task, their model is depicted in Figure2.3 In this work, they created an architecture that make use of two popular deep learningtechniques for NLP:

• Bi-directional LSTM: LSTM was one of the most popular method in NLP, with itsmultiple gate design and the ability to store information in each cell, it can alleviatethe vanishing and exploding gradients problem of an RNN, Bi-directional came as

an upgrade to LSTM, it comprises of two LSTMsr, one runs forward reading thesentence left to right, and one backwards reading right to left This way, not onlypast information are preserved but information from the future are also In the laststep, the output of two LSTMs are concatenated to form a single hidden state Byusing Bi-directional LSTM, a fuller context for words in a sentence can be obtained.This is why it is really popular and widely used in NLP research

• Output from Bi-directional-LSTM is then used as input for the CNN for featureextraction, this CNN is designed with a one-dimensional convolution layer withvarious kernels, which outputs a feature map, then a non-Linear RELU function

is applied on it and in order to capture the most important features, global pooling is used together with global average-pooling, the results of the two poolingmethods are concatenated as the output for the CNN, this then goes into a FullyConnected layer with softmax for the final output

max-They tested their implementation on the VLSP-2018 dataset and could achieved SOTAscore for both domains, with a significant F1 increase from the last high score

Trang 27

Figure 2.3: Multitask BiLSTM-CNN model for ABSA

2.4.3 Attention-based Sentiment Reasoner for ABSA

Liu et al.[15] proposed a method to capture exact sentiment expressions in ABSA Intheir work, they want to capture an accurate representation of the relationship of anyword in a sentence with the aspect by gradually assigning an increasingly precise weight

to different words according to that aspect Besides that, their goal is also to make amodel capable of learning sentiment words that are not present in a sentence, this is done

Trang 28

by modeling the relationship between any two words in the context.

They proposed a multi-layered neural network architecture, named Attention basedSentiment Reasoner to deal with the problem of how to make the machine think likehumans in ABSA Firstly an aspect un#zn representation is generated by getting thesum of the entity (un) and the attribute (zn) as follows:

easpect=

l1Xn=1

un+

l2Xn=1

zn, un ∈Rd, zn∈Rd

Where l1 is the length of the entity and l2 is the length of the attribute, d is the dimension

of the word embedding This aspect representation is then concatenated with each wordembedding within a sentence to get input ˆx Then this input is fed into each layer of themodel A layer includes three components:

• The Encoder: At first, the input sequence is encoded into a sequence of vector resentations by the encoder This encoder is an LSTM, which can efficiently capturelong-term dependencies and the word order information contained in a sentence

rep-• The Aspect-dependent representation memory: This component works based on how

a human reads text, we would not remember the whole sentence but only bits andparts that have a major role, the sentence semantics in our brain are continuallyupdated by the repeated reading And for a model to imitate that, they designthis component to act as extra memory storage, providing extra information for thereasoner This aspect-dependent representation memory consists of the reasoner’soutput from each layer, this memory is only updated after receiving output fromthe reasoner of the last layer and the memory state of each layer will not change inthe multi-layered architecture

• The Reasoner: The most important part is the reasoner module, which comprises ofintra attention and global attention mechanism The output vectors of the attentionmechanism are viewed as the output representations of the reasoner They are thenused to update the aspect-dependent representation memory

– Intra attention focuses on the relationship between any two words in a tence Depicted in Figure 2.4 By combining the sentence embedding, aspectembedding, and the memory from the last layer and using pooling, they can ob-tain word embedding of the most relevant word according to the current word.The final sentence representation is obtained by a weighted sum between theoutput of softmax and each word’s hidden state of encoder hi

Trang 29

sen-– Global attention looks at the whole sentence and find sentiment expressionstowards a specific aspect, depicted in Figure 2.5, the calculation method issimilar to Intra attention, but without max-pooling layer After computing theglobal attention weights, sentence representation is obtained as a weighted sum

of the hidden state of the encoder on the weights

– After computing sentence representation by intra attention and global tion, these two attentions are summed up to get a compact sentence representa-tion This representation is then sent to the next layer as the aspect-dependentrepresentation memory

atten-In the last layer of the model, the combined output of intra attention and globalattention is put through a softmax layer to obtain the final predictions for ABSA In theirwork, they also test their models with the two kinds of attention separately, one withoutthe other, and found that overall, the combined model can perform better

Figure 2.4: The architecture of intra attention

Trang 30

Figure 2.5: The architecture of global attention

2.4.4 Bidirectional Encoder Representations from TransformersBidirectional Encoder Representations from Transformers (Bert) is a language repre-sentation model introduced by Devlin et al [1] in 2018 It has since then become state-of-the-art model for various NLP tasks such as Question Answering (SQuAD v1.1), NaturalLanguage Inference (MNLI), and others

Bert’s model architecture is a multi-layer bidirectional Transformer encoder [1] based

on the original implementation described in Vaswani et al [16]

The authors pre-trained Bert by using a “masked language model” (MLM) pre-trainingobjective, inspired by the Cloze task [17] The masked language model randomly maskssome of the tokens from the input, and the model’s objective is to predict the originalvocabulary id of the masked word based only on its context Unlike left-to-right languagemodel pre-training, the MLM objective enables the representation to fuse the left and theright context, which would allow each word to indirectly “see itself”, and the model couldtrivially predict the target word in a multi-layered context [1]

Another task which was used to pre-trained the Bert model was the Next SentencePrediction (NSP) task In this task the model receives pairs of sentences as input andlearns to predict if the second sentence in the pair is the subsequent sentence in theoriginal document During the training phase, when choosing the input sentence pair A

Trang 31

and b, 50% of the inputs are pairs in which the B sentence is the actual next sentence thatfollows A, and 50% of the time it is a random sentence from the corpus This helps themodel understand sentence relationships, which many important downstream task such

as Question Answering (QA) and Natural Language Inference (NLI) are based on.The input representation is constructed by summing the corresponding token, segment,and position embeddings [1] A visualization of this construction can be seen in Figure2.6:

Figure 2.6: BERT input representation [1]

• A [CLS] token is inserted at the beginning of the first sentence and a [SEP] token

is inserted at the end of each sentence

• A sentence embedding indicating Sentence A or Sentence B is added to each token.Sentence embeddings are similar in concept to token embeddings with a vocabulary

of 2

• A positional embedding is added to each token to indicate its position in the quence

se-The Encoder layer (shown in Figure 2.7):

• BERT takes a sequence of words as input which keeps flowing up the stack Eachlayer applies self-attention, and passes its results through a feed-forward network,and then hands it off to the next encoder

• An attention function can be described as mapping a query and a set of key-valuepairs to an output, where the query, keys, values, and output are all vectors The out-put is computed as a weighted sum of the values, where the weight assigned to eachvalue is computed by a compatibility function of the query with the correspondingkey

Trang 32

Figure 2.7: The architecture of Bert Encoder layer

• In a self-attention layer all of the keys, values and queries come from the same place,

in this case, the output of the previous layer in the encoder Each position in theencoder can attend to all positions in the previous layer of the encoder

The output is a sequence of vectors of size H, in which each vector corresponds to aninput token with the same index That vector can now be used as the input for a classifier

Trang 33

yet to have a pre-train specifically made for Vietnamese Dat et al [18] noticed two mainconcerns in other pre-trained BERT models that feature Vietnamese, they are as follow:

• Lack of training data: up to that point, all Vietnamese BERT model was pre-trained

on the Wikipedia corpus However, this corpus was relatively small, with a size ofonly 1GB, while pre-trained models can significantly improve with more data

• No compound word handling There is a major difference between Vietnamese andEnglish grammar that is compound words In English, every word is independent,all words have a meaning even when they stand alone However in Vietnamese, somewords don’t mean anything when they don’t stand next to another, which makes

up compound words, for example, some compound words are khách sạn, đồ ăn,

To solve these problems, Dat et al have two solutions:

• For the lack of data, They used a large news corpus5, all similar and duplicatearticles are removed, it is then combined with the Wikipedia corpus to make up of20GB training data

• For the second problem, Vietnamese word segmentation, they used Vu et al.’s CoreNLP [19], which takes the input sequence and segments all compound words inthat sequence, an example can be seen in Figure 2.8

VN-They used fastBPE [20] to segment sentences into subword units, this helps themaccount for rare words and compress the vocabulary to a reasonable size In the end, theirdataset comprised of 140M word segmented sentences and a vocabulary of 64K subwords

Figure 2.8: Example of word segmentationHaving built a dataset, PhoBert was pre-trained with up to 540K training steps, taking

up to 5 weeks to train for the large model Its pre-training approach is based on Roberta.PhoBert’s performance was then tested with downstream tasks such as Named EntityRecognition (NER), POS, Dependency Parsing, the results achieved could clearly showthat PhoBert outperforms other pre-trained models in a Vietnamese NLP task Even

5

https://github.com/binhvq/news-corpus

Trang 34

compared to XLM-R, which was pre-trained on 137 GB of Vietnamese text (7 times that

of PhoBert), it could still have better performance thanks to the use of a segmenter, thisproves that a language-specific model outperforms a multilingual one

2.4.6 Comparing monolingual and multilingual pre-trained

models

Thin et al [21] wanted to demonstrate the effectiveness of PhoBert on a monolingualtask, so they test various Bert pre-trained models on the VLSP ABSA dataset Theyimplement a simple Bert structure with sigmoid classifiers as depicted in Figure 2.9, thegoal is to see fine-tune each pre-train model and compare their performance on the AspectDetection task (Phase A of ABSA) First, the text reviews were pre-processed followingthe steps in their previous work [14] With each model, they follow the recommendedpreprocessing steps For example, performing word segmentation before tokenizing withPhoBert Other models that they were comparing to PhoBert include Vietnamese pre-trained models such as viBert4news6

, viBert_FPT , vELECTRA_FPT, which are trained on a large corpus of Vietnamese text However, only PhoBert employs a segmenter

pre-in the preprocesspre-ing step Multilpre-ingual models pre-include mBERT, XLM-R, and mDistilBert.All of these models are trained with the same training parameters

Results of the study show that for the VLSP task, PhoBert could outperform allother models by a significant margin, in both the hotel and restaurant domain Thisproves that PhoBert can be effective for a downstream task that requires heavy semanticunderstanding like ABSA

With these studies showing that PhoBert is the perfect candidate to handle the namese ABSA task, we decided to use PhoBert as a base model, in our work we will beusing PhoBertlarge7 We will have further implementations to improve this base model inthe following sections

Viet-6

https://github.com/bino282/bert4news/

7

https://github.com/VinAIResearch/PhoBERT

Trang 35

Figure 2.9: Thin et al Bert implementation

In Thin et al.[10]’s work, besides releasing a sentence-level ABSA corpora, they alsoapplied various methods to their dataset to obtain baseline results, many of the methodshave been applied to VLSP’s document-level dataset Some of the notable implementationsinclude:

• Multiple SVM: Similar to the approach of This approach uses SVM classifiers withvarious handcrafted features: n-gram, word, part-of-speech (POS) information foraspect detection and n-gram, word, elongate word, aspect category, count of thehashtag, count the POS feature, punctuation mark for sentiment polarity

• BiLSTM + Attention: a basic model based on (BiLSTM) and the attention anism To obtain the context, the inputs are fed into a bidirectional LSTM Afterthat, the input representation was created by combining the vector representation

Trang 36

mech-of the attention layer, max pooling, and mean pooling before putting them in thefully connected layer.

• LSTM + Attention: Similar to the BiLSTM with Attention architecture, the LSTM was replaced with an LSTM to remember the context of the input

Bi-• BiLSTM - CNN: A common model that combines CNN and RNN architectures.BiLSTM was used to extract sentence context information, which then was passedinto the CNN layer to extract features In addition, they applied global max poolingconcatenate with global average pooling to represent the input

• PhoBert: For a monolingual corpus such as this, a language-specific pre-trained likeBert was their go-to choice They extracted the hidden state of the [CLS] token as

a representation of the entire sentence and put it into a Linear layer with softmaxactivation

After experimenting, they concluded that compared with the supervised learningmethod with handcrafted features, most of the neural network architectures achievedimpressive results on both domains However, PhoBert’s score was significantly higherthan all other implementations, demonstrating its effectiveness in the ABSA problem.Ans proving that it can be used for both document and sentence-level datasets

2.4.8 Leveraging Embedding comparison for training Bert

Natest et al [22] investigated further into Bert Embeddings Their focus is to improveBert’s effectiveness in the Polarity Classification step of ABSA Since Bert can encode notonly just the words in a document but also the context around the words They think thatcontextual embeddings like BERT can capture semantics related to the polarity associatedwith the aspect In order to derive the sentiment from the embedding, they perform thecomparison between contextual Bert embeddings and embeddings of the aspect word fromnon-contextual embeddings like GloVe and word2vec They perform this comparison onthe last 5 layers of Bert as follow:

• Given a sentence and a list of aspect, they obtain the Embeddings{cm1, cm2,

cm3 cmn} for each aspect from the mth layer of BERT, while also obtaining theEmbedding {g1, g2, g3 gn} from a pre-trained GloVe model Then multihead-selfattention is used to combine the Embeddings obtained from Bert The operation in

a sigle self-attention head can be represented by the formular:

c′mn =

5Xk=1

αm,k(Wvalueckn)

Trang 37

• The results from 8 heads are then concatenated, residual connections are then added

to reserve the original features from the Bert Layer

c′′mn= RELU (cimn+ W cmn)

• The process above is done for each embedding from the Bert layer, now the 5 refinedembeddings are combined and projected to 512 dimensions, along with the GloVeembeddings gn The two will then be concatenated and put through a Classificationlayer

is better as well

2.4.9 Hidden Layer aggregation for BERT

What information does Bert hiddne layers contain?

In Ganesh et al [23]’s work, they studied the information that Bert contains in itshidden layers The goal is to find out which kind of features is retained in which layers.They build a series of tasks that are specialized to test each feature, these tasks include:

• Surface tasks to test surface features such as sentence length and for the presence

of words in the sentence (WC)

• Syntactic tasks test for sensitivity to word order, the depth of the syntactic tree,and the sequence of top-level constituents in the syntax tree

• Semantic tasks check for the tense, the subject (resp direct object) number in themain clause, the sensitivity to random replacement of a noun/verb, and the randomswapping of coordinated clausal conjuncts

Trang 38

For the ABSA task, we will mainly focus on the semantic information, so knowingwhich layers contain what we need is of utmost importance All hidden layers of Bertwere extracted and used to perform the tasks above From their results, it appears thatthe semantic information is mostly stored in the top-level layers, specifically the last 4layers of a 12 layers Bert.

This research is extremely useful since most of Bert’s implementations only use thelast hidden layer for classification, while with the information in previous layers, resultscan be significantly improved Knowing this, we now need a way to make use of all thenecessary layers for the classification process

Hidden layer aggregation

Karimi et al.[24] wanted to make use of all the useful semantic information in Bert’stop-level hidden layers, so he came up with a mechanism called hierarchical hidden layeraggregation (HSUM), depicted in Figure 2.10 This mechanism contains three main steps:

• Step 1: They extracted all hidden layers from Bert and train them separately forthe ASC task, to evaluate which hidden layers have the suitable information for thetask, the results show that the top 4 layers achieve better results than the rest Thismeans the studies results of [23] also apply in this case, and the 4 top-level hiddenlayers retained the most amount of semantic information So after tokenizing andinput into Bert, they extracted the top 4 hidden layers for the classification process

• Step 2: Each extracted hidden layer in Step 1 is put through a Bert Layer, which isuntrained and have the same configuration as a Bert Layer in the pre-trained model,these layers’ output are aggregated with each other hierarchically The output of theBert Layer above is summed element-wise with the hidden layer one level below,then that sum is used as input for that lower level Bert Layer

• Step 3: Outputs of the Bert Layers are then parallelly fed into Linear layers withsoftmax activation, getting the predictions, the loss is then calculated for each layer’spredictions and summed up to get a global loss, this global loss is to be optimized

in the training process The loss is cross-entropy loss calculated as follows, with Cbeing the total amount of classes

Li =

CXc=1

global_loss = L1+ L2+ L3+ L4 (2.2)

Trang 39

Figure 2.10: Hierarchical Hidden level aggregation for Bert

They evaluate their model’s performance on the SemEval 2014 [3] Aspect SentimentClassification (ASC) task, the goal is to identify the polarities for a given set of aspectsfrom a sentence, this is a multi-class classification problem with labels being the senti-ments, identifying aspects is not required Their HSUM model could achieve the highestscore in the ASC task, outperforming other Bert-based models, showing that their method

of utilizing Bert’s hidden layers can significantly improve the performance

2.4.10 Hierarchical classifier for ABSA

As mentioned in section 2.3.1, an ABSA label is made up of three components: Entity,Attribute, Polarity If given the task of labeling to a human, it would be split into threeclear steps to identify each component in order For example, to label a sentence phòngtôi rất thoải mái (My room is very comfortable) in the hotel domain, a human annotatorwould do these following steps:

• Step 1: Identifying the subject (Entity), in this example, the entity being mentioned

is phòng tôi (My room), so out of the 7 available entities, ROOMS would be themost suitable

• Step 2: Identifying the attribute of the Subject, this step is to identify more cally what we want to mention in the room, thoải mái (comfortable) was mentionedabout the room, so the correct attribute is COMFORT Having identified the En-tity and Attribute, this gives us an Aspect, which is Entity#Attribute, the aspect

specifi-we identified is ROOMS#COMFORT

• Step 3: Identifying the sentiment for the aspect Looking at the review, we seethat the reviewer said that their room is very comfortable, so the sentiment forROOMS#COMFORT is positive

Trang 40

Having identified all three components of a label, we gotROOMS#COMFORT, itive, the steps above are repeated for each aspect, and only aspects that are present inthe review will be in the final labels.

pos-Inspired by this natural labeling process, Oanh et al [25] proposed a Bert-based tecture specialize in dealing with ABSA labels, their classifier also follows the three stepsmentioned above Depicted in Figure 2.11 Their goal is to perform ABSA on the VLSP

archi-2018 dataset, they used mBert, a multilingual Bert model, as a pre-trained model, andfurther trained it with 20GB of Vietnamese text, they named this model viBert WithviBert as the base Bert model, now comes the main innovative part of their architecture:the hierarchical classifier This classifier has three layers that do the classification steps

as we mentioned above Each layer is a Linear layer with sigmoid activation The firstlayer (Entity layer) takes the [CLS] token from viBert’s output and classify them intoentities, 6 entities for restaurant and 7 entities for hotel, then the output of this layer

is concatenated with the [CLS] token and used as input for the second layer, the sameoperation is done for the second and third layer, the following formulas explain how itworks:

The first layer is just a Linear layer with the [CLS] token as input

They compared their viBert model’s performance with VLSP 2018 submissions andtheir results were significantly higher, they set a new SOTA score for the task, showingthat a hierarchical classifier for ABSA is effective

Ngày đăng: 02/06/2022, 20:17

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN