Sentiment analysis and opinions summrization on social media

Background

Opinions summarization involves gathering common viewpoints expressed on social media, blogs, and forums, enabling consumers to efficiently process numerous comments and reviews prior to making decisions This practice also assists producers in monitoring customer perceptions of their products For example, Table 1.1 illustrates a summarization of feedback on Honda Accord performance.

The rapid expansion of online data has led to increased interest in automatic opinion summarization, particularly in extractive summarization, which identifies key text units to create summaries Typically, ranking candidates for generic summarization relies on various handcrafted features, including sentence position, length, and word frequency, or employs neural networks to learn salient scores.

Table 1.1: An sample of summarization on Honda Accord performance

I have owned this car for just a week, and I am pleasantly surprised by its impressive performance and solid build quality The styling and performance are exceptional, making it a great choice overall; however, the comfort level leaves something to be desired.

2) I just put it on the highway this weekend and its performance was bad!

3) Gas mileage is disappointing for a vehicle with this type of performance.

4) Great performance and handling make this a real Winner!

5) Overall performance is good but comfort level is poor 71% positive

6) The car is great, both with styling and performance 28% negative

Opinions summarization involves analyzing sentiments and aspects of text to create concise and informative summaries This process typically consists of three key subtasks: first, sentiment analysis, which determines the polarity (positive or negative) of sentiments related to specific subjects or topics; and second, semantic textual similarity, which assesses how closely related different pieces of text are in meaning.

Research Objective and Contribution

The semantic similarity \( q_{ij} \) between two sentences is crucial for identifying both informative and redundant content Additionally, aspect discovery focuses on extracting key properties of entities, such as battery life, design, and customer service Finally, summary generation leverages these insights to select the most significant opinions while eliminating redundant information, ultimately creating a concise summary.

Sentiment analysis has increasingly been approached as a classification problem, effectively utilizing supervised learning techniques Nonetheless, analyzing sentiments on social media presents several challenges, including the prevalence of ungrammatical text, the diverse range of topics such as technology and education, and the presence of irrelevant comments or spam.

Aspect discovery can be approached through two primary techniques: supervised and unsupervised learning Supervised learning treats aspect extraction as a sequence labeling task, but it faces challenges such as domain adaptation due to its reliance on predefined aspect lists and annotated data In contrast, unsupervised learning leverages large amounts of unlabeled data to identify aspects using statistical topic modeling like LDA or aspect-based autoencoders However, these methods also encounter limitations, including the need to determine an appropriate number of aspects for each domain and the cold-start problem, where insufficient reviews hinder effective analysis in certain domains.

The semantic textual similarity task faces significant challenges due to the diversity of linguistic expressions, where sentences may convey similar meanings despite differing lexicons Additionally, measuring similarity across various levels—such as word, phrase, and sentence levels—complicates the process These factors pose difficulties for traditional methods that rely on hand-crafted features.

This thesis explores deep learning methods to tackle the challenges posed by social media text across various tasks We introduce an innovative subtask that eliminates redundant information without the need to redefine a list of aspects, moving beyond traditional aspect discovery The following section will outline our research questions and contributions in detail.

This research aims to develop an effective method for identifying and summarizing opinions on social media by leveraging deep learning architectures to address the associated challenges The rise of deep learning models has enabled the creation of continuous representation vectors for text, such as word2vec, fastText, and GloVe, significantly enhancing the performance of natural language processing (NLP) tasks, including machine translation, summarization, and text classification The research question is explored through five subtasks, as illustrated in Figure 1.1.

Sentiment analysis determines the sentiment polarity of comments or reviews, categorizing them as positive, negative, or neutral Effective techniques for this analysis include Long Short Term Memory (LSTM) networks and Convolutional Neural Networks (CNN) CNNs utilize filters to effectively capture local dependencies within the data.

The overall framework for opinions summarization incorporates LSTM technology, which is designed to retain long-distance information However, merging these advantages into a single model presents challenges, particularly due to the risk of overfitting during the training process.

To address sentiment analysis challenges, we introduce a freezing technique that effectively learns sentiment-specific vectors using CNN and LSTM models This method harnesses the strengths of diverse deep learning architectures, while our findings indicate that semantically clustering documents enhances the performance of ensemble methods Experimental results demonstrate that our approach yields competitive outcomes across four prominent datasets: Pang & Lee movie reviews, Stanford Sentiment Treebank for sentence-level analysis, and IMDB large movie reviews and SenTube for document-level evaluations.

In the realm of sentiment analysis, the subject toward which a comment expresses sentiment or identifies as spam is crucial We introduce a novel convolutional N-gram BiLSTM (CoNBiLSTM) word embedding that captures both semantic and contextual information over varying distances This embedding is utilized to classify comments by type, determine their polarity (positive, neutral, or negative), and ascertain whether the sentiment targets a product or video We evaluated our model using the SenTube dataset, which includes comments from the automobile and tablet sectors in English and Italian The results indicate that CoNBiLSTM significantly outperforms the traditional SVM approach based on shallow syntactic structures, establishing itself as a superior method for sentiment analysis within the SenTube dataset.

Dissertation Outline

robustness across domains than the STRUCT (e.g 7.47% of the difference in performance between the two domains for our model vs 18.8% for the STRUCT).

Semantic textual similarity (STS) evaluates the similarity between two sentences, playing a crucial role in identifying informative and redundant sentences in summarization Recent advancements in natural language processing have leveraged pretrained word embeddings for enhanced performance Different word embedding models, based on their objective functions, capture various linguistic properties, which are essential for STS tasks This research proposes a novel approach that encodes diverse characteristics from multiple word embeddings into a single representation, enabling the learning of sentence similarity and relationships By utilizing a MaxLSTM-CNN encoder to create unique sentence embeddings from multiple word embeddings, we employ a Multi-level comparison method to assess similarity Our M-MaxLSTM-CNN model demonstrates robust performance across various tasks, including measuring textual similarity, identifying paraphrases, and recognizing textual entailment, without relying on hand-crafted features or requiring uniform dimensions for pretrained word embeddings.

Aspect similarity recognition (ASR) estimates the likelihood that two sentences share at least one common aspect, making it essential for effective review summarization that captures all relevant points while minimizing redundancy We introduce an attention-cell LSTM model that incorporates attention signals into the LSTM gates Experimental results demonstrate that the attention-cell LSTM effectively learns latent aspects between sentences in both in-domain and cross-domain scenarios.

We developed the ASRCorpus dataset, which includes two domains: LAPTOP and RESTAURANT, to enhance the application of supervised learning models Our approach to Opinions Summarization leverages these signals to rank sentences, enabling the generation of concise and informative product summaries by selecting the most salient sentences from reviews Unlike traditional methods that rely on predefined lists of aspects, our technique rates expressions using ASR, eliminating the need for prior aspect definitions and facilitating domain adaptation The extractive summarization method we propose, utilizing ASR, demonstrates significant improvements compared to baseline models on the Opinosis corpus.

The remainders of this thesis are organized as follows:

Chapter 2 addresses the formulation of the Sentiment Analysis problem through a literature review that identifies gaps in existing methods It details the proposed freezing technique for feature learning and clustering support for ensemble approaches, evaluating these methods across four established datasets compared to robust baselines Additionally, the chapter discusses the results and analyzes common error cases to draw meaningful conclusions.

Chapter 3 describes the problem formulation of Subject Toward Sentiment Analysis.

We do a literature review to analyze the gaps in current methods The proposed convolutional N-gram BiLSTM word embedding is explained in details and evaluated over the

SenTube dataset containing two domains: TABLET and RESTAURANT against strong baselines We also discuss the results and analysis some typical error cases for making a conclusion.

Chapter 4 addresses the formulation of the Semantic Textual Similarity problem, conducting a literature review to identify gaps in existing methodologies It introduces the M-MaxLSTM-CNN model, which utilizes multiple sets of word embeddings to assess sentence similarity and relations The model is thoroughly explained and evaluated using benchmark datasets across various tasks, compared against robust baselines Additionally, the chapter discusses the results and analyzes common error cases to draw meaningful conclusions.

Chapter 5 introduces the innovative Aspect Similarity Recognition task, beginning with a survey of recent methods for aspect category classification and examining research on the relationship between sentences It details the creation of the ASRCorpus annotation dataset, which encompasses two domains: LAPTOP and RESTAURANT The chapter thoroughly explains the convolutional attention-cell LSTM model and evaluates its performance against strong baselines, alongside an analysis of typical error cases In Chapter 6, the formulation of the Opinions Summarization problem is presented, highlighting gaps in existing methods through a literature review It elaborates on a novel aspect-based summarization approach utilizing Aspect Similarity Recognition, evaluated on the Opinosis dataset against robust baselines, and discusses the results and common errors to draw conclusions.

Chapter 7 concludes our research and discusses future directions based on our works.

Introduction

The rise of Web 2.0 has led to an explosion of user-generated content, significantly increasing data volume This data allows for the analysis of various types of knowledge, particularly sentiment information, which reflects users' evaluations, attitudes, emotions, and opinions about individuals, products, or organizations Recent years have seen a surge in research focused on sentiment analysis from text, highlighting its importance in understanding public perception (Pang and Lee, 2008; Liu, 2012).

Wang and Manning (2012) approached sentiment analysis as a classification problem, utilizing a Support Vector Machine variant with Bag of bi-gram and Naive Bayes features, known as NBSVM, which demonstrated strong performance in experiments on both long and short reviews However, the Bag of Words model has notable limitations, including sparse vectors and a lack of consideration for word semantics and order In response to these challenges, Mikolov et al (2013) introduced a word embedding technique that encodes words into continuous representations, significantly advancing the field of natural language processing.

In their 2014 study, Le and Mikolov introduced Paragraph Vector, a technique that employs neural networks for Word embedding representation, effectively modeling documents as vectors This approach demonstrated superior performance compared to the traditional Bag of Words model in sentiment analysis and information retrieval Li et al (2016) further improved Paragraph Vector by enabling the prediction of n-gram features alongside words Meanwhile, Kim (2014) showcased the effectiveness of convolutional neural networks (CNN) for semantic composition, utilizing convolutional filters to capture local context dependencies, although these filters struggle with long-distance dependencies To address this limitation, Long Short-Term Memory (LSTM) networks were introduced to retain information over extended periods Our goal is to create a hybrid model that leverages the strengths of these advanced techniques.

This paper is structured as follows: Section 2.2 outlines the research objectives and contributions, while Section 2.3 provides a review of previous studies on opinion mining In Section 2.4, we present the proposed architecture for encoding and utilizing sentiment feature vectors Section 2.5 details the ensemble model that incorporates clustering support, and Section 2.6 describes the dataset and experimental setup used in our study The findings and discussions of the experimental results are presented in Section 2.7, culminating in the conclusion of our work in Section 2.8.

Research Objective and Contribution

The proposed framework for sentiment analysis involves two key components: first, it extracts sentiment scores using a three-layer neural network and NV-SVM scoring methods Second, it employs autoencoder models to cluster sentences, representing them through these sentiment scores to enhance sentiment prediction accuracy.

In our research, we aim to create an ensemble model that leverages the strengths of various models, such as generative and discriminative models, while minimizing the risk of overfitting Our experiments reveal that combining two networks into a single model often leads to overfitting, as discussed in Section 7 To address this, we separately train CNN and LSTM to encode sentiment information into feature vectors, employing a freezing technique during training to further prevent overfitting For sentiment classification, we integrate these sentiment-specific vectors with a semantic-specific DVngram vector in a 3-layer neural network Notably, in sentiment analysis, even slight variations in sentences can yield opposite sentiments, yet generative models tend to produce similar vectors for similar sentences To counter this, we developed an autoencoder model to generate representation vectors for sentences and documents, which are then used for clustering Additionally, we enhance sentiment prediction for each cluster by incorporating the prediction score from the NBSVM method The architecture of our framework is illustrated in Figure 2.1.

We compared our model with some competitive methods on the five well-known

Related work

The datasets utilized in our research include the IMDB large movie reviews, Pang & Lee movie reviews, Stanford Sentiment Treebank, Stanford Twitter Sentiment, and SenTube Experimental results demonstrate that our proposed method achieves competitive performance at both the sentence and document levels Our key contributions highlight these advancements in sentiment analysis methodologies.

• We generate sentiment vectors via CNN and LSTM under the freezing scheme. These vectors provide a simple and efficient way to integrate the strong abilities of deep learning models.

We introduce a method for clustering data into groups of semantically similar sentences or documents Each sentence or document within these clusters is represented by prediction scores generated from the NBSVM method and our novel 3-layer neural network To enhance accuracy, we propose an ensemble approach that utilizes these scores effectively.

Sentiment analysis studies how to extract people’s opinion toward entities Taboada et al.

In 2011, sentiment analysis was initiated by assigning labels to text through the extraction of sentiment words, although this method overlooked syntax and context To address these shortcomings, Saif et al (2016) proposed a lexicon-based approach that captures the semantic sentiment information of words based on their co-occurrence patterns, enabling effective sentiment detection at both the entity and tweet levels Liu (2012) further advanced the field by framing sentiment analysis as a classification task, utilizing machine learning techniques This approach emphasized the importance of developing effective features, including word n-grams (Wang and Manning, 2012), emoticons (Zhao et al., 2012), and sentiment words (Kiritchenko et al.).

Research by Fersini et al (2016) highlights the significance of signals like adjectives and expressive lengthening in sentiment analysis, revealing that adjectives are particularly impactful and discriminative The context of a word plays a crucial role in determining its polarity, as seen in examples like "cheap design" (negative) versus "cheap price" (positive) To address this, Vechtomova (2017) utilizes reference corpora with sentiment-annotated documents to disambiguate sentiment polarity, proving effective at the word level but limited at the sentence level However, these methods often require extensive resources and handcrafted features, which depend on a well-defined knowledge base that may struggle with nuanced concepts (Cambria, 2016) Our proposed model leverages deep learning techniques to automatically learn efficient features for sentiment analysis, overcoming these limitations.

The rise of deep learning models has revolutionized sentiment classification by enabling the efficient learning of continuous representation vectors Mikolov et al (2013) pioneered techniques for semantic word representation, utilizing neural networks in word prediction tasks to create word embedding vectors that encapsulate semantic meanings These embedding vectors cluster together words with similar meanings However, given that semantic information can convey opposing sentiments in varying contexts, research by Socher et al (2011) and Tang et al (2014) has focused on developing sentiment-specific word representations through the analysis of sentiment-laden text.

The paragraph vector model, introduced by Le and Mikolov (2014), treats paragraph IDs as words and utilizes their embeddings for representation, enhanced by incorporating emotional words for improved sentiment analysis in English and Spanish tweets This model captures contextual information through sentence and document modeling, with Yessenalina and Cardie (2011) using matrix multiplication to represent phrases Le and Mikolov's method encodes paragraphs into continuous vectors, effectively capturing semantics Deep recursive neural networks (DRNN) have also been employed for sentiment classification, while Ma et al (2018) introduced the Sentic LSTM cell, which integrates commonsense knowledge into hierarchical attention models, outperforming existing methods in targeted aspect sentiment tasks Additionally, convolutional neural networks (CNN) have shown success in natural language processing, with researchers like Kim (2014) and Zhang and Wallace (2017) utilizing convolutional filters for local context dependencies Tang et al (2015) combined CNN and LSTM for sentence and document representation, while Zhang et al (2016) proposed Dependency Sensitive CNN for hierarchical text representations Recent studies, such as those by Wang et al (2016) and Vo et al (2017), have integrated CNN and LSTM models for enhanced sentiment analysis, culminating in a hierarchical CNN-LSTM architecture by Gan et al (2017) that encodes sentences into continuous representations.

Sentiment representation learning

LSTM for sentiment feature engineering - LSTM feature

LSTM, a type of recurrent neural network introduced by Goller and Kuchler in 1996, addresses the exploding and vanishing gradient problems by incorporating a memory cell that maintains its state over extended periods Additionally, LSTM features non-linear gating units that control the flow of information into and out of the memory cell.

Sentences are transformed into continuous representation vectors through the recursive application of an LSTM unit, which processes each input word \( x_t \) along with the previous hidden state \( h_{t-1} \) At every time step \( t \), the LSTM unit, characterized by an \( l \)-dimensional memory, generates six vectors in \( \mathbb{R}^l \), including the input gate.

The LSTM model for sentiment analysis, as illustrated in Figure 2.3, involves a training process where the parameters of the neural network layer (highlighted in blue) are kept constant Key components of the model include the input gate \(i_t\), forget gate \(f_t\), output gate \(o_t\), tanh layer \(u_t\), memory cell \(c_t\), and hidden state \(h_t\) The equations governing these components are as follows: the input gate is calculated as \(i_t = \sigma(W_i x_t + U_i h_{t-1} + b_i)\), the forget gate as \(f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f)\), the output gate as \(o_t = \sigma(W_o x_t + U_o h_{t-1} + b_o)\), and the tanh layer as \(u_t = \tanh(W_u x_t + U_u h_{t-1} + b_u)\) The memory cell is updated using \(c_t = f_t c_{t-1} + i_t u_t\), and the hidden state is derived from \(h_t = o_t \tanh(c_t)\) Here, \(\sigma\) represents the logistic sigmoid function, and the equations utilize element-wise multiplication.

W i , U i , b i are respectively two weights matrices and a bias vector for input gate i The denotation is similar to the others.

The forget gate in an LSTM model determines which previous information to discard, while the input gate regulates the new information to be added to the memory cell The output gate then controls the amount of information revealed from the internal memory These gating mechanisms enable the LSTM to retain crucial information across multiple time steps The hidden state \( h_l \) from the last step encapsulates all relevant information and is processed by a neural network layer to generate the prediction output \( \hat{y} = \sigma(h_l W_{nn} + b_{nn}) \), where \( W_{nn} \) and \( b_{nn} \) represent the parameters of the neural network layer.

Figure 2.3 explains how to employ the LSTM architecture for memorizing sentiment information over sequential data The model contains two parts: (i)Building sentiment

2.4 SENTIMENT REPRESENTATION LEARNING feature- the LSTM layer encodes sentiment information of input into a fixed-length vector; (ii)Classifying layer- this sentiment-specific representation vector will be classified by the last neural network layer (the blue layer in Figure 2.3) As applying the freezing scheme, this NN layer’s parameters W nn and b nn are unchanged during the training process.

In Figure 2.4, a Convolutional Neural Network (CNN) is utilized for sentiment analysis, processing a sequence of four-dimensional word embeddings The model employs four filters—two with a region size of 2 and two with a region size of 3—to create four distinct feature maps Notably, during the training phase, the parameters of the final layer of the neural network (indicated in blue) remain frozen and are not trained.

CNN for sentiment feature engineering - CNN feature

In this article, we introduce a sentence represented as a matrix of dimensions d×s, where each row corresponds to a d-dimensional word embedding vector for each word The convolutional neural network (CNN) applies convolution to this sentence matrix S using linear filters Each filter is represented by a weight matrix W with a length of d and a region size of h, containing d×h parameters that need to be estimated for effective processing of the input matrix.

In the context of convolution operations, a feature map vector O, represented as O = [o₀, o₁, , oₛ₋ₕ], is derived from the matrix S ∈ R^(d×s) by applying a filter W to various sub-matrices of S Each element oᵢ in the feature map is computed using the formula oᵢ = W ⋅ ãSᵢ:ᵢ₊ₕ₋₁, where i ranges from 0 to s-h, and Sᵢ:ᵢ₊ₕ₋₁ denotes the sub-matrix of S spanning rows from i to i+h-1 This process effectively captures localized features within the input matrix S.

Each feature map O is processed through a pooling layer to extract potential features, commonly utilizing the 1-max pooling strategy This approach focuses on identifying the most significant feature v from the feature map by selecting its maximum value, represented as v = max.

In this article, we detail the operation of a single filter and illustrate the use of four filters with varying region sizes to extract multiple 1-max pooling values, as shown in Figure 2.4 These pooling values from feature maps are then concatenated into a CNN feature, denoted as k_cnn, which encapsulates sentiment information through a collection of maximum values To connect these values effectively, a neural network (NN) layer synthesizes a high-level feature from the CNN output This high-level feature is subsequently processed by another NN layer with sigmoid activation, producing a probability distribution over sentiment labels, represented mathematically as x₁ = σ(k_cnn W₁ + b₁) and ˆy = σ(x₁ W₂ + b₂), where ˆy signifies the prediction output and W₁, W₂, b₁, and b₂ are the parameters of the NN layers.

During the training phase, we adopt a strategy akin to our LSTM model by keeping the parameters W2 and b2 of the final neural network layer untrained This approach ensures that the sentiment vectors remain generalized and are not overly tailored to a specific classification layer.

Classifying with sentiment vectors

In Figure 2.5, we illustrate the synthesis of feature vectors from CNN and LSTM Our analysis of the CNN sentiment vectors in the development set revealed some ambiguous cases, indicating that determining sentiment polarities based solely on CNN vectors can be challenging To enhance the clarity of sentiment analysis, we incorporated additional information by concatenating CNN sentiment vectors with LSTM sentiment vectors or DVngram semantic vectors As shown in Figures 2.5c and 2.5d, the classification boundary for the combined CNN-LSTM sentiment features is noticeably clearer compared to that of the CNN sentiment features alone.

CNN and LSTM sentiment vectors, derived from sentiment classification models, can be easily categorized using machine learning techniques A multi-layer neural network sentiment classifier that utilizes both vectors achieves perfect classification on the training set within a few epochs However, this results in inefficient optimization of the classifier's parameters, leading to no performance improvement on the testing set, which indicates overfitting when compared to using LSTM or CNN individually for classification.

To tackle the issue of overfitting in neural networks, we utilize a 3-layer neural network with Dropout regularization as proposed by Srivastava et al (2014) This technique involves randomly dropping out hidden units with a specified probability during each training iteration, which helps in preventing overfitting and allows for efficient exploration of various neural network architectures By implementing Dropout, our model effectively analyzes different combinations of feature vectors through a series of transformations, ultimately leading to the output prediction.

Ensemble with clustering support

(a) CNN features in the train set (b) CNN+LSTM features in the train set

(c) CNN features in the development set (d) CNN+LSTM features in the development set

The t-SNE projection for the IMDB dataset illustrates the feature vector, which is formed by combining the CNN sentiment vector with another vector, such as the LSTM sentiment vector or DVngram semantic vector The output, denoted as ye i, is refined through the application of Dropout to yi, resulting in the final prediction output, represented as ˆy.

Generative models, trained on contextual information, prioritize capturing semantic meaning over sentiment, leading to the creation of similar vectors for semantically alike sentences, even if they express opposing sentiments This phenomenon complicates sentiment classification, as demonstrated by the sentiment distribution of similar document groups To address this challenge, we propose clustering semantically similar sentences or documents and improving the classification accuracy of each cluster by incorporating additional features, achieved through encoding sentences into fixed-length vectors using an autoencoder.

The Dropout technique, illustrated in Figure 2.6, is applied to a 3-layer neural network (NN) model, resulting in a thinned network that enhances performance This approach utilizes vectors for clustering sentences and documents, where each cluster's content is represented by prediction scores from both the method described in Section 2 and the efficient NBSVM model Unlike traditional neural networks, NBSVM employs a Bag of Words representation, distinguishing it from word embedding techniques We anticipate that the scores generated by NBSVM will significantly enhance the performance of each cluster.

In our analysis, we utilize two prediction scores derived from different methods: one from our proposed approach detailed in Section 2 and the other from the NBSVM model To enhance accuracy, we implement a voting mechanism where each classifier, denoted as f i, acts as a voter with a confidence ratio r i that contributes to the final probability score across class distributions, expressed as p(c i |x) = 1.

X k=1 p k (c i |x)r k (2.17) wherec i is theith sentiment class, N is the number of classifiers,p k (c i |x) is the prediction score of the classifier k on the ith class for a sentence/document x.

To enhance classification performance, we train an ensemble model that assigns optimal confidence ratios to each classifier Our approach utilizes a neural network to learn these values, employing a 2-layer feedforward neural network to implement a voting scheme, where the network's weights represent the confidence ratios These weights are optimized using the Adamax algorithm, as proposed by Kingma and Ba in 2015.

Figure 2.7: The t-SNE projection of CNN+LSTM sentiment features for the IMDB development set BiLSTM autoencoder is used for clustering

Table 2.1: Statistic summary of datasets cv is 10-fold cross validation |V| avai is the proportion of vocabulary available in the Word2Vec embedding.

Dataset average length train size test size vocabulary size V|avai|(%)

Datasets and Experiment setups

Datasets

For evaluation, we use the five well-known datasets: MR-L, MR-S, SST, Tweet, and SenTube Table 2.1 shows the statistic summary of datasets.

• MR-L [Maas et al., 2011a] contains 50,000 reviews from IMDB, where each movie has no more than 30 reviews.

SenTube, developed by Uryupina et al in 2014, is a comprehensive database comprising 38,000 comments This collection focuses on two specific product categories: automobiles (AUTO) and tablets (TAB), featuring notable examples such as the Apple iPad, Motorola Xoom, and Fiat 500 The authors meticulously gathered and annotated comments sourced from commercials and review videos related to these products.

• MR-S [Pang and Lee, 2005] labels 5,331 sentences as positive and 5,331 sentences as negative These sentences are selected from Internet movie reviews.

• Tweet[Go et al.,2009] contains 1,6 million tweets, which are automatically labeled via positive/negative emoticons Only the test set is human-annotated Follow the

2.7 RESULTS AND DISCUSSION previous research [dos Santos and Gatti,2014], we randomly select 80,000 tweets for training and 16,000 tweets for validation Although tweets are short, their |V| avai is high In other words, most of the words in tweets are unknown to the Word2Vec embedding 1 This is a major challenge to word embedding based approaches.

SST, developed by Socher et al in 2013, expands upon the MR-L dataset by incorporating review sentences alongside 215,154 labeled phrases These phrases were annotated using Amazon Mechanical Turk and are utilized in our experiments for training purposes.

Hyper-parameter Value Grid search’s range

Number of each CNN’s region size 100 [50,100, ,500]

3-layer NN’s first layer input’s dimension

Dropout for 3-layer NN’s first layer 0.9 [0,0.1, ,0.9]

Dropout for 3-layer NN’s second layer 0.5 [0,0.1, ,0.9]

Ensemble’s first NN layer 3 ×input’s dimension [1,2, ,10]

Experimental setups

To optimize our models' hyper-parameters, we conduct a grid search using 30% of each dataset This greedy method identifies the best values for each hyper-parameter within a specified range, focusing on the LSTM and CNN models individually before optimizing the 3-layer neural network Table 2.2 presents the optimal configurations for all five datasets, along with the grid search ranges; notably, the optimal region size for MR-L is 100, not 300 We utilize pre-trained Word2Vec word vectors, which have a dimension of 300, and these vectors are further optimized during the training of our LSTM and CNN models.

Results and Discussion

Freezing vs Unfreezing

Our model employs a technique that freezes the parameters of the last neural network layer to mitigate overfitting We assessed the effectiveness of this approach by comparing the performance of our frozen vector against a sentiment-specific vector derived from the conventional unfreezing method Utilizing a 3-layer neural network model, we found that the frozen feature vector outperformed the unfreezing method in most cases Furthermore, we evaluated the combined performance of a sentiment-specific vector with a semantic-specific vector, DVngram Overall, our freezing technique demonstrated superior performance compared to the unfreezing approach, particularly in convolutional neural networks (CNNs) as opposed to long short-term memory networks (LSTMs), especially when integrating both vector types.

Evaluation on combining features

In this section, we compared in performance our approach of combining features from variant models againstMerging scheme which horizontally merges variant models (details in Figure 2.9).

Our findings indicate that the feature vector combination method is more effective with Convolutional Neural Networks (CNN) than with Long Short-Term Memory (LSTM) networks Specifically, the CNN feature vector demonstrates strong performance, whereas the LSTM feature vector yields inconsistent outcomes—showing improved results when paired with the CNN feature vector but underperforming when combined with the DVngram vector compared to the Merging scheme.

In most of the cases in Merging scheme, a composition model (i.e., CNN-LSTM) try to reproduce the result of its child models (e.g., CNN, LSTM) and does not provide a significant improvement.

Figure 2.9: The architecture of merging models.

Table 2.5: Accuracy results of features combining scheme and Merging scheme

Method MR-S SST MR-L AUTO TAB Tweet

* denote results statistically significant at p < 0.05 via the pairwise t-test compared with the Merging scheme using the same features.

Evaluation on clustering support

This section explores the impact of clustering methods on classification performance, utilizing K-Means and Birch algorithms with different K settings K-Means is a partitioning clustering method, while Birch represents hierarchical clustering For our experiments, we focus on the largest dataset, MR-L, to evaluate the effectiveness of these algorithms.

The analysis presented in Table 2.6 reveals a notable performance disparity between K-Means and Birch clustering algorithms K-Means demonstrates significant effectiveness in managing large clusters, while struggling with smaller ones In contrast, Birch exhibits the opposite trend, performing better with smaller clusters but less effectively with larger ones.

C 3 = 592 samples), the proposed approach, however, does not work well Generally, K-Means is more efficient in our ensemble approach compared with Birch.

Our model's performance was assessed using varying numbers of clusters, as illustrated in Figure 2.10, which displays the results for K ranging from 0 to 10 Both K-Means and Birch algorithms exhibited similar patterns in their performance as K increased Notably, when K exceeds 9, the effectiveness of clustering begins to decline, resulting in negative contributions.

In Table 2.6, we present the accuracy results for the MR-L dataset, comparing K-Means and Birch clustering methods Here, K represents the number of clusters, C i indicates cluster i, and Sample refers to the number of samples within each cluster The figures in parentheses highlight the changes in accuracy relative to the ensemble approach that does not utilize clustering techniques.

C 3 7528 93.26(+0.02) 592 93.26(-0.15) more clusters we have, the smaller each cluster is This fact makes the training processing inefficient because of the small training set in each cluster.

Figure 2.10: The ensemble model’s performance on MR-L dataset.

Quality analysis

To evaluate the strengths and weaknesses of the proposed model, we manually examined several representative samples, as illustrated in Table 2.7 These examples effectively demonstrate the model's performance in comparison to alternative approaches, particularly in the context of simple sentences that maintain consistent sentiment polarity.

The proposed model effectively identifies sentiment polarities, even in long sentences, where traditional models using CNN and LSTM struggle to interpret the overall sentiment For instance, while these models attempt to capture sentiment words like "romantic," "fresh," "anguished," and "bitter," they often fail to classify the sentences accurately In contrast, the integration of NBSVM in the proposed model allows for precise sentiment classification, demonstrating its superiority in sentiment analysis.

Conclusion

with the claim of Wang and Manning[2012] that NBSVM with bi-gram features and NB log-count ratios consistently performs well on long sentences/documents.

Table 2.7 presents typical samples where CNN+LSTM refers to the outcomes of a three-layer neural network utilizing CNN and LSTM sentiment features The column labeled "True" indicates the actual sentiment labels, with 0 representing negative and 1 representing positive sentiments.

1 a refreshingly realistic, affecta- tion free coming of age tale

3 apparently, romantic comedy with a fresh point of view just doesn’t figure in the present hollywood program

5 the last scenes of the film are anguished, bitter and truthful mr koshashvili is a director to watch

6 clayburgh and tambor are charming performers neither of them deserves eric schaeffer

8 You’ve seen them a million times.

9 A whole lot foul, freaky and funny.

Examples #8 and #9 highlight that false hits arise from insufficient contextual information, as certain words like "foul," "freaky," and "million times" can convey either positive or negative sentiments depending on their context Without this context, accurately determining sentiment polarities becomes challenging Thus, enhancing the model to better capture contextual information is a promising direction for future research.

In this study, we implement a freezing technique with CNN and LSTM to generate feature vectors, effectively leveraging the strengths of multiple models To address the limitations of generative models, we introduce a strategy for clustering documents and sentences based on their semantic similarity Additionally, we employ a neural voting ensemble combined with NBSVM to enhance the performance of each cluster, resulting in significant improvements in sentiment analysis outcomes.

Our research focused on simple models, but we find it intriguing to apply our freezing scheme approach to combination models, such as multi-channel CNN-LSTM and hierarchical LSTM, for generating feature vectors Additionally, our clustering method is grounded in semantic similarity, and exploring other types of similarity could yield valuable insights.

Subject Toward Sentiment Analysis on Social Media

Introduction

Social networking sites, particularly YouTube, serve as diverse platforms where users from various cultural and linguistic backgrounds share multimedia content and opinions A study by Aliaksei Severyn (2016) indicates that 60-80% of comments on YouTube reflect personal opinions, highlighting the significance of effective sentiment analysis in this context Our research addresses the complexities of opinion mining on YouTube, which include challenges such as the prevalence of ungrammatical comments, the need for a versatile approach to handle a wide range of topics, the ambiguity of sentiment-laden words related to both video content and advertised products, the presence of irrelevant or spam comments, and the multilingual nature of the platform Thus, developing a sentiment analysis method that is resilient to grammatical variations is essential.

To tackle the challenges in sentiment analysis, we introduced the Convolutional N-gram BiLSTM word embedding model, which effectively captures both semantic and contextual information This innovative approach operates without the need for linguistic resources or handcrafted features, delivering strong performance across multilingual settings.

This paper is structured into several key sections: Section 3.2 details the motivation and contributions of the research, while Section 3.3 provides a review of prior studies in opinion mining In Section 3.4, we present the architecture of our proposed model, followed by Section 3.5, which describes the SenTube dataset and associated tasks Section 3.6 reports and discusses the experimental results, and finally, Section 3.7 offers a conclusion to our work.

Motivation and contribution of the work

Previous studies in sentiment analysis predominantly utilized Bag of Words (BOW) representation For example, Wang and Manning (2012) implemented a Support Vector Machine variant known as Naive Bayes feature (NBSVM) By representing documents or sentences using bi-gram features, NBSVM demonstrated consistent performance across various datasets, including both long and short reviews.

3.2 MOTIVATION AND CONTRIBUTION OF THE WORK

The winning system of the SemEval 2013 shared task, developed by Mohammad et al (2013), utilized a Bag of Words (BOW) representation combined with a sentiment lexicon in a Support Vector Machine model However, BOW representation has significant limitations, as it disregards the order of words and overlooks their semantic meanings For instance, the phrases "iPad 2 is better" and "the superior apps just destroy the xoom" demonstrate how BOW fails to capture the nuances of sentiment in the context of word arrangement.

In the analysis of comments about the product Xoom, it is noted that despite containing one negative word and two positive words, the overall sentiment remains negative This phenomenon is frequently observed in YouTube comments, where users discuss videos and associated products Traditional Bag of Words (BOW) representation fails to identify the specific subject of sentiment words, leading to ambiguity To overcome this limitation, Severyn et al (2016) proposed encoding comments into a shallow syntactic tree with enriched tags, known as STRUCT, which effectively captures sentiment words, key product concepts, and negation However, this method relies on tools like POS-taggers and chunkers, as well as sentiment lexicons for various languages, limiting its use in multilingual contexts Moreover, the sentiment polarity of a word is context-dependent, a nuance that the tree structure does not adequately address.

Bengio et al (2003) proposed an unsupervised framework for learning continuous word vectors, where semantically similar words are represented by similar vectors, such as "strong" being close to "powerful." In contrast, the Bag of Words (BOW) model treats all word distances equally, failing to capture semantic nuances This innovative word embedding approach has significantly advanced deep learning techniques in natural language processing, particularly in the area of sentiment analysis.

Words can serve various functions and meanings depending on their context For instance, the word "has" appears as both a verb and an auxiliary verb in different comments, while the adjective "cheap" conveys contrasting sentiments in separate instances Despite these varying functions and meanings, word embedding models assign a single vector to each word, which results in a loss of the word's specific function and contextual significance.

By analyzing the neighboring words of a given term, one can discern its function, meaning, and sentiment, which is crucial for grasping the overall message of a sentence This insight led to the development of convolutional filters designed to encode words along with their contextual information into a convolutional N-gram word embedding representation However, these convolutional filters face two main challenges: they overlook long-distance contextual relationships due to their limited size, as seen when the word "Although" moderates the negative sentiment of "outdated" in a distant context; and they fail to account for the positional significance of words within a sentence, which can emphasize certain terms, such as adjectives at the beginning of a review To overcome these limitations, Bidirectional Long Short Term Memory (BiLSTM) is integrated with the convolutional N-gram word embedding representation, enabling the capture of long-distance contextual information and the importance of word positioning.

3.2 MOTIVATION AND CONTRIBUTION OF THE WORK

Table 3.1: Some YouTube comments from the SenTube dataset

1 as would I, jaguar always has a place in my heart a place that BMW cannot fill

2 I agree however now Jaguar has been bought out the reliability should increase.

3 Nobody wants it because it’s made of cheap materi- als

4 i couldnt believe it when my friend told me about this site and i can tell u , ive seen this car selling ridiculously cheap on this site.

While some may argue that the multitasking capabilities of iPads are outdated, I believe their approach is more practical The ability to pause one app and easily switch to another aligns well with how most users interact with tablets In my experience, having apps run in the background is rarely necessary.

BiLSTM (CoNBiLSTM) word embedding Figure 3.1 shows an overview of our proposed framework for YouTube sentiment analysis.

Figure 3.1: An overview of our sentiment analysis model

The contribution of our research is:

To improve traditional word embedding representations and better capture contextual information, we developed multiple convolutional filters of varying sizes The convolution N-gram vectors produced from these filters are then input into a Bidirectional Long Short Term Memory (BiLSTM) network, which effectively encodes long-distance contextual dependencies and the positional information of words.

Our model is versatile and applicable to any language, as it utilizes unsupervised word embedding representations without the need for linguistic preprocessors like POS-taggers or chunkers In languages such as Japanese, Chinese, and Vietnamese, where words are not separated by spaces, word embeddings can be effectively trained using syllables or characters.

Related work

Sentiment analysis in English

In their 2011 study, Taboada et al developed a method for assigning sentiment labels to text by extracting sentiment-bearing words Building on this, Liu (2012) framed sentiment analysis as a classification problem, prompting researchers to focus on creating effective features such as word n-grams (Wang and Manning, 2012), emoticons (Zhao et al., 2012), and sentiment words (Kiritchenko et al., 2014) Additionally, Saif et al (2016) proposed a lexicon-based approach that utilizes co-occurrence patterns to capture semantic sentiment information, enabling both entity-level and tweet-level sentiment detection Fersini et al (2016) further advanced the field of sentiment detection with their contributions.

3.3 RELATED WORK investigates the impact of several expressive signals (i.e adjectives, pragmatic particles, and expressive lengthening) These signals have been employed to enrich the feature space of baseline and ensemble classifiers According to the experimental results, the author concluded that adjectives are more discriminative and impacting than pragmatic particles and expressive lengthening The polarity of a single word is impacted by its context (e.g cheap price (positive) vs cheap material (negative)) To disambiguate contextual sentiment polarity at word-level, Vechtomova [2017] introduced an information retrieval approach which uses reference corpora with sentiment annotated documents Although this approach was shown to be an effective alternative to machine learning approaches for disambiguating word-level contextual sentiment polarity, the method has not shown an improvement compared to other methods in sentence-level sentiment analysis Instead of using additional reference corpora for disambiguating sentiment polarity, we design the CoNBiLSTM model to learn contextual sentiment polarity for each word The experimental results show the efficiency of this approach at sentence-level.

Metaheuristic-based methods have also been applied to opinion mining Gupta et al.

In 2015, a Particle Swarm Optimization approach was introduced for feature selection, which was then utilized with the Conditional Random Field method to classify sentiment, achieving impressive results with a significantly reduced feature set In 2017, Pandey et al addressed the random initialization issue in cuckoo search by employing K-means, optimizing the cluster-heads of the sentiment dataset, and surpassing the performance of both the traditional and improved cuckoo search methods.

Designing handcrafted features for sentiment analysis demands significant effort and linguistic preprocessing tools, which can restrict its use in multilingual settings like YouTube In contrast, deep learning techniques offer a key advantage by learning new features from a limited set, making them a promising solution for multilingual applications.

The recent rise of deep learning models has revolutionized sentiment classification by enabling the effective learning of continuous representation vectors Pioneering techniques for semantic word representation were introduced by Bengio et al in 2003 and further advanced by Mikolov et al in 2013.

The authors utilized a neural network for word prediction tasks, generating semantic word embedding vectors where similar meanings are represented by closely situated vectors However, the same semantic information can convey opposing sentiments in different contexts To address this, research has focused on sentiment-specific word representations using sentiment-laden text Approaches for sentence and document-level composition have gained traction, with Yessenalina and Cardie modeling words as matrices and employing iterated matrix multiplication for phrase representation Deep recursive neural networks (DRNN) have been applied over tree structures for sentiment classification, utilizing binary parse trees and recursive tensor neural networks with sentiment treebanks Additionally, convolutional neural networks (CNN) have proven effective for semantic composition in recent studies.

3.4 CONVOLUTIONAL N-GRAM BILSTM WORD EMBEDDING convolutional filters to capture local dependencies in term of context windows and applies a pooling layer to extract global features Le and Mikolov[2014] applied paragraph information into the word embedding technique to learn semantic document representation Tang et al.[2015] used CNN or LSTM to learn sentence representation and encoded these semantic vectors in document representation by a gated recurrent neural network. Zhang et al [2016a] proposed Dependency Sensitive CNN to build hierarchically textual representations by processing pretrained word embeddings Huy Tien and Minh Le[2017] propose a freezing scheme to learn sentiment features This technique efficiently integrates the advantages of LSTM-CNN and avoids overfitting Although contextual information might change the sentiment polarity of a word, this property is still not carefully con- sidered in the prior work To confirm the efficiency of the proposed CoNBiLSTM for encoding contextual sentiment polarity, we carried out experiments and quality analysis to compare the performances of CoNBiLSTM and BiLSTM.

Sentiment analysis in multi-lingual setting

Severyn et al (2016) introduced a shallow syntactic tree with enriched tags that effectively captures sentiment lexicon words, key product concepts, and negation terms They evaluated their approach using a YouTube corpus in both Italian and English, demonstrating improved performance in sentiment analysis for both languages Similarly, Vilares et al (2017) assessed the classification of multilingual polarity across different models, including a multilingual model trained on a diverse dataset and a dual monolingual model with and without language identification Their experiments on English and Spanish tweets highlighted the efficiency and robustness of the multilingual methodology.

Giatsoglou et al (2017) introduced a hybrid vectorization method that integrates emotional words with word embedding techniques to minimize the use of syntactic features Their experiments, conducted in both English and Greek, demonstrated that this hybrid approach, which combines word embedding with Bag-of-Words representations, surpassed the performance of other existing methods.

Our research improves word embedding representation by effectively capturing contextual information without relying on linguistic resources, which often fall short for low-resource languages By utilizing convolutional filters and BiLSTM, our contextual word embedding model demonstrates superior generalization across various domains, accommodating changes in word distribution and vocabulary, outperforming previous methods such as those proposed by Severyn et al (2016).

Convolutional N-gram BiLSTM word embedding

Bidirectional Long Short Term Memory (BiLSTM)

In LSTM explained in Section 2.4.1, sentences of variable length are transformed to fix- length vectors as follows: it=σ(Wixt+Uiht−1+bi) (3.1) f t =σ(W f x t +U f ht−1+b f ) (3.2) o t =σ(W o x t +U o ht−1+b o ) (3.3) u t = tanh(W u x t +U u ht−1+b u ) (3.4) c t =f t ct−1+i t u t (3.5) h t =o t tanh(c t ) (3.6)

The limitation of LSTMs is that their hidden state, h_t, only processes information from the left context, leaving the right context unconsidered To overcome this issue, Dyer et al (2015) introduced the Bi-directional LSTM (BiLSTM), which utilizes two distinct LSTM units to capture both left and right contexts—one operating in the forward direction and the other in the backward direction.

3.4 CONVOLUTIONAL N-GRAM BILSTM WORD EMBEDDING ward direction Two hidden states h f ordward t and h backward t from these LSTM units are concatenated into a final hidden state h bilstm t : h bilstm t =h f orward t ⊕h backward t (3.7) where ⊕ is concatenation operator.

Convolutional N-gram BiLSTM word embedding

We introduce a document represented as a matrix of dimensions d×s, where each column corresponds to a d-dimensional word embedding vector for individual words To process this document matrix S, we apply convolution using linear filters H_k Each filter H_k is characterized by a weight matrix W_H_k of length 1, a region size h_k, and a bias value b_H_k The weight matrix W_H_k consists of h_k + 1 parameters that need to be estimated When applying filter H_k to the input matrix S ∈ R^(d×s), the transformation of the matrix S is performed accordingly.

S k ∈ R d×h k /2+s+h k /2 by padding zero, which makes the result of convolutional operator be the same dimension as the matrix S Then a convolutional N-gram word embedding matrix C k ∈R d×s is obtained as follows:

C k [i, j] =W H k ãS k [i, j−h k /2 :j+h k /2] +b H k (3.8) where ã is dot product operation and Sk[i, l :t] is the sub-matrix of Sk from column l to t of row i.

To obtain a final convolutional N-gram word embedding matrix, an average pooling is applied over those convolutional N-gram word embedding matrices Ck as follows:

C k [i, j] (3.9) where N is the number of filters.

In the context of YouTube, comments can express sentiments about videos, products featured within them, or even unrelated products, complicating sentiment analysis To enhance the model's accuracy in identifying the subject of a comment, we incorporate the video title as an additional feature, as it typically describes the product showcased By analyzing both the comment and the corresponding video title, we generate two convolutional N-gram word embedding matrices: C_comment for the comment and C_title for the title.

To make our convolutional N-gram word embedding take into account the word’s position as well as capture long distance contextual information, we apply BiLSTM to this word embedding:

The equation T = BiLSTM(M(C comment ⊕ C title)) represents a Bi-directional Long Short-Term Memory (BiLSTM) model where ⊕ denotes the concatenation of comment and title vectors Each column in T, which belongs to R²l×s, is a 2l-dimensional CoNBiLSTM word embedding vector for individual words, derived from equation (7), with l indicating the dimension of the LSTM units.

In classification tasks utilizing CoNBiLSTM word embeddings, the initial and final columns of T, representing word embedding vectors that encapsulate the entire context in both forward and backward directions, are input into a neural network comprising two fully connected layers.

SenTube dataset & Task description

Task description

In our experiment, we evaluated the proposed model on three tasks:

• Sentiment task: This task detects whether a comment expresses a positive, a negative, or a neutral sentiment The sentiment can be general or about a specific topic (e.g product or video).

In the YouTube environment, comments often express sentiments towards both the video and the product featured within it, making it crucial to accurately identify the target subject of each comment This classification process involves categorizing comments as either related to the video, related to the product, or uninformative, which includes off-topic remarks or spam By effectively distinguishing these categories, we can enhance the understanding of viewer engagement and sentiment analysis.

In this task, we aim to simultaneously predict both the sentiment and type of each comment, framing it as a multi-label classification challenge This involves seven distinct labels, which are derived from the Cartesian product of type labels—comprising 'product' and 'video'—and sentiment labels, including 'positive,' 'neutral,' 'negative,' along with an 'uninformative' class.

SenTube dataset

SenTube, as detailed by Uryupina et al (2014), comprises approximately 38,000 English comments and 10,000 Italian comments The author curated a selection of products across two categories: automobiles (AUTO) and tablets (TABLET), including items like the Apple iPad, Motorola Xoom, and Fiat 500 Comments were gathered and annotated from commercials or review videos related to these products; however, some products lack corresponding Italian commercials or reviews.

Experiment & discussion

Model configuration

To tune the hyper-parameters of our models, we do a grid search on 30% of the dataset for Full task:

The word embedding layer utilizes pre-trained Word2Vec vectors, which were developed from 100 billion words sourced from English Google News These vectors, optimized during training, feature a dimensionality of 300, enhancing the model's ability to understand and process language effectively.

• Convolutional filters: two filters (h= 1,3) are employed to construct a convolutional N-gram word embedding.

• BiLSTM layer: the dimension for each direction the same as the word embedding layer (l = 300).

• First full-connected layer: the dimension is the same as BiLSTM layer (d 1 = 600)

• Dropout layer: For avoiding overfitting, we employ a dropout layer (p=0.5) between the first and second full-connected layers.

• Second full-connected layer: the dimension is the number of target labels We used a softmax activation for this layer.

1 https://code.google.com/p/word2vec/

Table 3.4: Summary of Italian comments data used in Sentiment,Type and Full tasks

Train Validation Test Train Validation Test

Product-pos 253(11%) 50(7%) 176(14%) 218(10%) 172(16%) 154(10%) Product-neg 216(10%) 132(18%) 190(15%) 355(16%) 145(13%) 211(13%) Product-neu 283(13%) 148(20%) 272(21%) 719(33%) 451(41%) 551(35%) Video-pos 271(13%) 50(7%) 146(11%) 92(4%) 22(2%) 112(7%) Video-neg 127(6%) 51(7%)) 36(3%) 41(2%) 11(1%) 62(4%) Video-neu 351(16%) 113(15%) 171(13%) 246(11%) 64(6%) 195(12%) Uninformative 661(31%) 206(15%) 294(13%) 527(24%) 243(22%) 285(18%)

To tune the hyper-parameters of our models, we do a grid search on 30% of the dataset for Full task:

We utilized pre-trained word vectors from Bojanowski et al (2017b), which were derived from Italian Wikipedia data and feature a dimensionality of 300 These vectors are optimized throughout the training process to enhance performance.

• Convolutional filters: two filters (h = 1,17) are employed to construct a convolutional N-gram word embedding.

• The other layers are the same as the English model.

In-domain experiment

We assessed the CoNBiLSTM word embedding model by comparing it with the BiLSTM and the STRUCT method from Severyn et al (2016) across three tasks outlined in Section 3.5.2 The accuracy performance of these three models on the SenTube dataset for two languages and domains is presented in Table 3.5.

The proposed method consistently outperforms existing approaches, particularly when comparing performance on the AUTO and TABLET datasets Previous research using the STRUCT method revealed that AUTO's performance lagged significantly behind TABLET across various tasks This discrepancy can be attributed to two main factors: TABLET benefits from a larger training dataset, and the differing audience types, with AUTO attracting well-informed users who provide more nuanced opinions, while TABLET engages a broader, more general audience Consequently, analyzing comments in the AUTO domain proves to be more complex Our model, however, has successfully narrowed the performance gap between AUTO and TABLET.

Table 3.5: The result of the in-domain experiment

STRUCT BiLSTM CoNBiLSTM STRUCT BiLSTM CoNBiLSTM

Full 45.6 47.47 51.05 a,b 52.4 53.24 55.03 a,b a,b denote results statistically significant at p < 0.05 via the pairwise t-test compared with STRUCT and BiLSTM respectively.

Our model demonstrates strong adaptability across various domains, with a performance difference of 7.47% for TABLET domains compared to 18.8% for STRUCT in EnglishFulltask Notably, the AUTO domain shows significant enhancements in performance across all tasks, with the exception of the Sentiment task for both Italian AUTO and English TABLET.

In the Sentiment task, STRUCT showed better performance than our model in the English TABLET and Italian AUTO domains, though the difference was not statistically significant STRUCT's method involved utilizing a pre-defined list of sentiment words, which simplifies sentiment analysis, particularly when training data is limited In contrast, our approach focuses on a multilingual task where resources such as sentiment and synonym dictionaries, as well as linguistic preprocessing tools, are scarce; thus, we refrain from using any additional labeled data for sentiment analysis.

Table 3.6: The results of the cross-domain experiment

Language Source Target Task STRUCT BiLSTM CoNBiLSTM

Full 31.7 38.24 39.5 a,b a,b denote results statistically significant at p < 0.05 via the pairwise t-test compared with STRUCT and BiLSTM respectively.

Cross-domain experiment

In this experiment, we trained a model using data from one domain and evaluated its performance on data from a different domain This approach allows us to assess the adaptability of our models across varying datasets.

3.6 EXPERIMENT & DISCUSSION well as whether we need training data for a new domain Table 3.6 reports the accuracy of the three tasks in the cross-domain setting.

Our experimental results demonstrate that our model exhibits superior robustness and stability in cross-domain settings For instance, in the EnglishFulltask, the performance difference between our model trained on AUTO (48.03%) and TABLET (47.6%) is merely 0.43%, compared to a significant 6.4% difference with STRUCT This indicates that our model possesses enhanced generalization capabilities.

The proposed model demonstrates enhanced accuracy across both languages when compared to BiLSTM and STRUCT, with the exception of the Type task in the Italian AUTO domain In this context, the average comment length in the Italian AUTO domain is significantly longer at 154 characters, compared to 111 characters in the TABLET domain Additionally, the in-vocabulary size for the target domain TABLET is smaller at 38%, while the AUTO domain has a larger size of 43% These discrepancies present challenges for the model's performance.

Table 3.7: The precision, recall and F1 scores of CoNBiLSTM for each class in the English experiments

Precision Recall F1 Support Precision Recall F1 Support

Product-pos 62.95 55.01 58.72 349 54.39 39.19 45.55 712 Product-neg 35.71 34.56 35.13 217 45.67 37.63 41.26 869 Product-neu 43.03 54.59 48.13 447 67.1 79.71 72.86 2400 Video-pos 68.2 46.85 55.54 444 78.45 68.93 73.39 412 Video-neg 21.05 7.69 11.27 104 27.68 20.67 23.66 150 Video-neu 68.71 44.2 53.79 939 52.05 46.58 49.16 599 Uninfo 50.15 76.48 60.57 897 61.71 64.16 62.91 1314

Performance on each class

This section evaluates the performance of three tasks by examining the Precision, Recall, and F1 scores for each class, as detailed in Table 3.7 In the Sentiment task, the negative class presents the greatest challenge across both domains, primarily due to its lower representation and the complexity of its grammar, making it harder to develop an effective classifier compared to the positive and uninformative classes The challenges of identifying negative sentiment are further explored in Section 3.6.5 Meanwhile, in the Type task, the uninformative class proves to be significantly more difficult to classify, as it may encompass spam, off-topic comments, or references to unrelated products.

3.6 EXPERIMENT & DISCUSSION intuitively quite hard to classify it InFulltask, the Product-negative and Video-negative classes have the worst performance This confirms with the result fromSentiment task.

Quality analysis

To evaluate the strengths and weaknesses of the proposed model, we conducted a manual inspection of select cases, as illustrated in Table 3.8 These samples effectively highlight both the advantages and limitations of the model.

Table 3.8: Some typical samples for quality analysis The labels are 0:product-positive, 1:product-neutral, 2:product-negative, 3:video-positive, 4:video-neutral, 5:video-negative and 6:uninformative

No Title Comment BiLSTM CoNBiLSTM True

1 ferrari 430 review ferrari look so dull (spelling) and boring! lamborghini is so much more awsome! they look so mean! and just evil!

2 bugatti veyron vitesse video review is it just me? or does the dash look pretty dull

3 ferrari 430 review this is my dream car, and its getting cheaper and cheaper o yah!

4 2012 range rover evoque coupe hd video review so it does have a rear wiper as in the new lexus rx eh?

5 ferrari 430 review why did they change the music the original was way more dramatic

The 2012 Fiat 500 offers an engaging driving experience, yet Toyota and Honda continue to dominate the reliability rankings in the automotive industry For instance, my 1997 Honda Civic, with its original engine, has impressively reached 400,000 miles, showcasing the durability and dependability of these brands.

Our model significantly enhances the ability to capture long-distance dependencies compared to BiLSTM, which is crucial for accurately analyzing sentiments in comments For instance, in sample #1, a comment expresses both negative sentiment towards Ferrari and positive sentiment towards Lamborghini, necessitating an understanding of the relationship between the title and the comment Additionally, our model excels in creating improved word embeddings that incorporate contextual information and part of speech In sample #2, the word "pretty," which functions as an adverb, is misclassified by BiLSTM as an adjective with positive sentiment, leading to an incorrect prediction Furthermore, the word "cheaper" illustrates how sentiment can vary based on context, further highlighting the limitations of BiLSTM in sentiment analysis.

#3, our model correctly assigned the positive sentiment for “cheaper” while BiLSTM did not.

Our model demonstrates low performance in identifying negative comments, as highlighted in Section 3.6.4 To better understand the challenges associated with negative feedback, we examined several representative negative comments, detailed in Table 3.8 Notably, Comment #4 poses a rhetorical question that conveys a negation, while Comments #5 and #6 lack direct expressions of negativity.

Model description

Experimental setting

Experiments and Discussion

Related work

Methodology

Experiment

Tiêu đề	Sentiment Analysis And Opinions Summarization On Social Media
Tác giả	Nguyen Tien Huy
Người hướng dẫn	Associate Professor Nguyen Le Minh
Trường học	Japan Advanced Institute of Science and Technology
Chuyên ngành	Information Science
Thể loại	Doctoral Dissertation
Năm xuất bản	2019
Thành phố	Japan

Định dạng
Số trang	97
Dung lượng	2,04 MB