Understanding students’ learning experiences on social media is an important task in educational data mining.. For mining these datasets, there existed several work done for English usin
Trang 1Students’ Learning Experiences
Tran Thi Oanh(B) International School, Vietnam National University, Hanoi,
144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
oanhtt@isvnu.vn
Abstract Understanding students’ learning experiences on social
media is an important task in educational data mining Since it provides more complete and in-depth insights to help educational managers get necessary information in a timely fashion and make more informed deci-sions Current systems still rely on traditional machine learning methods with hand-crafted features One more challenge is that important infor-mation can appear in any position of the posts/sentences In this paper,
we propose an attentive biLSTMs method to deal with these problems This model utilizes neural attention mechanism with biLSTMs to auto-matically extract and capture the most critical semantic features in stu-dents’ posts in regard to the current learning experience We perform experiments on a Vietnamese benchmark dataset and results indicate that our model achieves state-of-the-art performance on this task We achieved 63.5% in the micro-average F1 score and 59.7% in the macro-average F1 score for this multi-label prediction
Keywords: Attention mechanism·biLSTMs·Students’ learning
experience·Social media
1 Introduction
Students’ learning experience refers to the feelings/thoughts of students in the process of getting knowledge or skills from studying in academic environments
It is considered to be one of the most relevant indicator of education quality in schools/universities [17] Getting to understand this is an effective and important way to improve educational quality in schools/universities
Learning experiences can vary dramatically for students To determine stu-dents’ learning experiences, the widespread used methods is to undertake a num-ber of surveys, direct interviews or observations that provide important opportu-nities for educators to obtain student feedback and identify key areas for action Unfortunately, these traditional methods usually cost time, thus cannot be fre-quently repeated Moreover, they also raise the question of accuracy and valid-ity of data collected because they do not accurately reflect on what students were thinking or doing something at the time the problems/issues happened c
Springer Nature Switzerland AG 2020
H A Le Thi et al (Eds.): ICCSAMA 2019, AISC 1121, pp 267–278, 2020.
Trang 2Another drawback is that the selection of the standards of educational practice and student behavior implied in the questions is also criticized in the surveys [5] Nowadays, social sites such as Facebook, forums, blog, etc provide great venues for students to express their opinions, concerns and emotions about the learning process When students post on these sites, they usually write about their feelings/thoughts at that moment Therefore, the textual data collected from on-line conversations may be more authentic and unfiltered than responses
to formal research surveys These public data sets provide vast amount of insights for educators to understand students’ experiences besides the above traditional methods For mining these datasets, there existed several work done for English using traditional machine learning classifiers with hand-crafted features Some typical classifiers used in mining various problems in students’ learning process are Decision Tree [13], Naive Bayes [6], SVM [8], Memetic [2], etc In Vietnamese, not much effort has been spent to mine such data so far Tran and Nguyen [14] presented the first work towards mining social media to get insights from Viet-namese students’ posts They developed a framework using Naive Bayes and Decision Tree to automatically detect students’ issues and problems in their study at universities
Recently, deep neural network approaches provide an effective way of reduc-ing the number of hand crafted features Specifically, neural networks have been proved to improving the performance of many tasks ranging from question gen-eration [18], machine translations [7], relation classification [19], etc Hence, in this paper, we propose a novel architecture exploiting a neural network called attention-based biLSTMs for mining students’ learning experiences This model doesn’t use any features derived from knowledge resources or Natural Language Processing (NLP) systems We perform experiments on a benchmark dataset, and achieve 63.5% in the micro-average F1 score and 59.7% in the macro-average F1 score, higher than the existing methods in the literature for this critical task The rest of this paper is organized as follows: Sect.2 presents related work
In Sect.3, we show a proposed method using attention-based biLSTM to deal the task Section4shows experimental setups, evaluation metrics, experimental results and some findings of this work on a dataset benchmark for Vietnamese Finally, we summarize the paper in Sect.5 and discuss some on-going work for the future
2 Related Work
Social media has risen to be not only a communication media for personal pur-poses, but also a media to share opinions about products and services or even political issues among its users Many researches from diverse fields have devel-oped tools to formally represent, measure, model, and mine meaningful patterns (knowledge) from large scale social for the concerned domains In healthcare, many researches, e.g Sue et al [12] has shown that social media can be used
to reveal lots of health information about its users, or to provide online social support for anyone with health problems [16] In the marketing field, researchers
Trang 3mine the social data to recommend friends or items (e.g online courses, videos, beauty product, research papers, search keywords, social tags, and other prod-ucts in general.) on social media sites, etc
Recently, research on mining web-based conversations in informal ways on social media (e.g., Facebook, forum, etc.) has started emerging From these sites there are huge amount of textual data are generated which contain important data about students There existed many researches proposing different tech-niques to process such data to better know about students and their learning environments This information will be valuable to institutions/universities to make informed decisions related to students’ learning For example, Chen et al [3] firstly provided a framework for analyzing these kind of data using Twitters’ posts for educational goals Takle et al [13] did a detailed study to make com-parison of different classification techniques such as Iterative dichotomiser (ID3), Naive Bayes Multi-label Classifier and Memetic Classifier using common dataset
to analyze and get the information related to students in order to enhance the higher education system, etc Blessy et al [2] developed a framework to use both qualitative analysis and big data mining techniques using Naive Bayes Multi-label Classifier algorithm and Memetic classifier to categorize tweets presenting students’ problems Pande et al [8] exploited the SVM method to determine Many issues like stress, suicide, sleepy problems, and anxiety in students’ posts Patil et al [9] showed that the way students indicate their feelings via social media sites and which posts are in which category using Memetic algorithm Jessiepriscilla et al [6] built a sentiment analyzer tool for analyzing tweets which can be used to accomplish the goal of determining the student learning experi-ences using Navie Bayes multilabel classifier All of these researches were done using traditional machine learning methods
While most work has focused on English, a few attempts have been done for Vietnamese so far Specifically, Tran and Nguyen [14] presented the first work towards mining social media to get insights from engineering students’ posts They developed a framework to automatically detect students’ issues and prob-lems in their study at universities Similar to other work in English, the authors also exploited traditional machine-learning methods which are Naive Bayes and Decision Tree to build the prediction models This work also contributed the first benchmark dataset on this field in Vietnamese The experimental results were just the preliminary step and need more effort to enhance the performance
of the methods
As can be seen that, previous work mostly exploited traditional machine learning methods which require hand-crafted features Designing these features
is commonly time-consuming and requires experts’ knowledge Another chal-lenge is that in a post, some words play more important roles in deciding its main meanings Especially, when one students’ post may contain more than one meaning In recent years, deep neural network methods give us an effective way to make the quantity of hand crafted features less in size It also does not use extra knowledge and NLP systems Therefore, this research proposes a novel archi-tecture exploiting attentive biLSTM for the task of mining students’ learning
Trang 4experiences on social media Specifically, we convert the multi-label classifica-tion into binary classificaclassifica-tion problems and then exploit the attentive biLSTM
to build the corresponding models for these problems The effective of the pro-posed method is verified on a Vietnamese benchmark dataset through extensive experiments
3 An Attention-Based biLSTM for Understanding
Students’ Learning Experiences
In formal statement, multi-label learning problem can be seen as the problem of looking for a method that converts inputs x to binary vectors y These binary vec-tors are not scalar outputs as in the single-label classification problem Learning from multi-label classification problem can be solved by transformation tech-niques This technique turns the problem into some single-label classification
problems This work uses the techniques called binary relevance Specifically, assume that we have p labels, this method creates p new data sets, one dataset
for each label This binary relevance method then trains single-label classifiers for each of these new data sets Each single-label classifier only classifies whether
or not the current sample belong to the current label i? The multi-label
predic-tion for a new sample is determined by combining the classificapredic-tion results from all of these independent single-label classifiers
Each of these classifiers will be built using attentive LSTM architecture as illustrated in Fig.1 This deep neural network is usually very effective to encode sequences of words and is very powerful to learn on data which have long range dependencies It considers each word in the posts with equal importance weight The attention mechanism proposed to allow the model to pay attention to more important part of the students’ posts Therefore, this model can automatically concentrate on the important words that have greater impact on the final clas-sification, to record the most important semantic information in each post This model does not use any extra knowledge and outputs from NLP systems The overall framework consists of four main layers as follows
Each students’ post consists of n words, s ={w1, w2, , w n }, where w i is the i th word of the post Each word in the posts will be converted into a vector x i using word embedding Word embedding is one of the most effective representation
of post vocabulary nowadays It has the capability of encoding the context of
a word in a post, semantic as well as syntactic similarity, and the relation with other words, etc In this paper, we use GloVe [10] which is an unsupervised learning algorithm for capturing representations for words in the vector form
Let X = (x1, x2, , x n) be a students’ post consisting of the vector
represen-tations of n words in one post At each location t, the outputs of RNNs express
an intermediate representation based on h - a hidden state:
Trang 5Fig 1 An attention-based biLSTM for understanding students’ learning experiences
on social media
where Wy and by denote parameter matrix and vector These are determined
in the training process, σ denote the element-wise Softmax function The hidden
state htis updated using an activation function It is a function of the previous
hidden state ht−1and the current input xtas follows:
LSTM cells exploit a few gates to update the hidden state ht These gates include
an input gate it, a forget gate ft, an output gate otand a memory cell ct The update formula is given below:
i = σ(W ixt+ Viht−1+ bi ), (3)
f = σ(W fxt+ Vfht−1+ bf ), (4)
Trang 6ot = σ(W oxt+ Voht−1+ bo ), (5)
ct= ft c t−1+ it tanh (W cxt+ Vcht−1+ bc ), (6)
where multiplication operator functions, V is a weight matrice, and b is vectors
to be learned
To improve the model performance, two LSTMs are trained on user
utter-ances The first on the utterance from left-to-right (l i) and the second on a
reversed copy of the utterance (r i ) The forward and backward outputs, l i and
r i , should be combined into c i by concatenation by default before being passed
on to the next layer
Let H denote a matrix including output vectors [h1 , h2, , h n] that biLSTMs
layer produced, where n is the post length You can just take the straight average
these vectors and feed that to your classifier But it is also true that not all of this information will be equally important That is why we need attention to tell
us which words are less important and which words are the most important We
will train a little neural network from H to vote on how important each word
is Let r be the representation of the post r is created by a weighted sum of the
output vectors as follows:
where w is a trained parameter vector and w T is a transpose A little alpha here tells you how important the cell is then you do the weighted sum and feed that into your classifier
We get the last post-pair representation which will be used to classify as follows:
This work exploits a softmax classifier1 to guess the label y ∗ from a pre-defined
set of classes Y for a student’s post s The model gets the hidden state h ∗ as input:
y ∗ = argmax
y (p(y |s)) (13)
1 Instead of using this softmax function, you can also use the sigmoid function as an alternative In fact, in the binary classification both sigmoid and softmax functions are the same where as in the multi-class classification softmax function is preferred
Trang 74 Experiments
This section first presents about the dataset used to conduct experiments Typ-ical evaluation metrics are also described to estimate the effectiveness of the proposed method Then, the detailed configuration to set up experiments is shown Finally, this section expresses experimental results on this dataset
Data were collected from a forum of a famous university in Vietnam The dataset contains 1834 posts relating to students’ learning experiences of an information technology university In this dataset, one post can fall into one or multiple categories There are seven categories which are also the main problems/issues that students often meet in their studying process at the university Figure2
gives a description of the number of instances per labels in our dataset
Fig 2 Number of posts in each category of the dataset analyzed.
The evaluation metrics for the multi-class classification is slightly different with metrics for single-label task In multi-label classification, a misclassification is not a hard wrong or right A predicted set of labels which includes a subset of the gold classes should be considered better than a predicted set that does not contains any gold class In this paper, we report both settings to evaluate the performance of the method
Trang 8In this situation, researchers [4] proposed two types of metrics which are example-based measures and label-based measures
Example-Based Measures These measures are calculated based on examples
(in this case each post is considered as an example) and then averaged over all posts in the dataset Suppose that we are classifying a certain post p, the gold (true) set of labels that p falls into is G, and the predicted set of labeled by the classifier is P, the example-based evaluation metrics are calculated as follows:
M
M
i=1
G i ∩ P i
G i ∪ P i
P rec = 1
M
M
i=1
G i ∩ P i
P i
M
M
i=1
G i ∩ P i
G i
F 1 = 1 M
M
i=1
2∗ P recision i ∗ Recall i
P recision i + Recall i
here M is the number of posts in the corpus
There are two more commonly used measures to estimate the effectiveness of multi-labeled classification which are micro-average F1 and macro-average F1 The former gives the same weight to each classification decision per post, while the latter gives the same weight to each label They are variants of F1 used in different situation
Label-Based Measures These measures are measured on each label and then
get averaged values over all labels in the dataset Specifically, metrics of recall,
precision, and F1 for each label l is calculated as follows:
F 1 = 2∗ P ∗ R
P + R
T P + F P
T P + F N
where TP is the number of posts that are correctly detected as the
currently-considered label l FP is the number of posts belonging to l but mis-identified to another label FN is the number of posts of l but not recognized by the models.
Trang 9Table 1 Experimental results of detecting students’ learning experiences using
example-based metrics
Methods Accuracy Precision Recall F1 micro F1 macro
Decision Tree 0.565 0.548 0.571 0.583 0.558
Attentive LSTM 0.612 0.587 0.629 0.635 0.597
The model was implemented in Python programming language with several typ-ical libraries such as PyTorch, numpy, sklearn, utils, etc These libraries provide
rich tools and options to support developments in NLP and many other research fields
To create pre-trained embeddings of words, we gathered the raw data from Vietnamese newspapers (≈ 7 GB texts) to train the vector of word model using
Glove2 The quantity of word embedding dimensions was fixed at 50
For each label, we created a corresponding dataset which only focuses on the currently-considered label On this dataset, we performed 5-fold cross-validation tests to evaluate the performance of the proposed attentive biLSTMs-based model on this dataset The parameters were chosen by using the development set We randomly select 10% of the training data as the development set To detect students’ learning experiences, we set the quantity of epochs equals 100, the batch size as 20, early stopping as True with 4-epoch patience, the rate of dropout at 0.5
In this paper, we compare the performance of the proposed model with the best results of previous work on this same dataset The best performance of previous work is using Decision Tree method [14] in the same binary relevance setting
In that work, Tran et al exploited C4.5 (J48) This algorithm is used to build a decision tree proposed by Ross Quinlan [11] C4.5 begins with big sets of cases of known classes These cases are represented by any mixture of properties both in nominal and numeric forms The cases are carefully examined for patterns which allow the classes to be reliably discriminated These patterns are then indicated
as models that can be later used for classifying new unseen cases The patterns emphasize on the ability of the models to be understandable as well as accurate This C4.5 was ranked top #1 in the best 10 data mining algorithms published
by Springer LNCS in 2008 [15] Using this method, the baseline model achieved 58.3% in the micro-average F1 score, and 55.8% in the macro-average F1 score Table1showed experimental results of the baseline and the proposed method using example-based metrics It should be noted that the higher the evaluation metrics, the better the performance of the models As can be seen that the
2 https://github.com/standfordnlp/GloVe.
Trang 10attentive biLSTM model significantly boosted the performance of this task It achieved the better results by around 4% on all metrics of accuracy, recall, precision, macro-average F1 and micro-average F1 scores Specifically, the F1-micro score increased by 5.2% and the F1-macro score increased by 3.9% This result suggested that the attention mechanism has significant effects on mining students’ learning experiences in social media In reality, it is quite effective
in helping the model focus down on the words that are the most useful for classification of students’ learning experiences
Table 2 Experimental results of the attentive-based biLSTMs for detecting students’
learning experiences using label-based metrics
Study
load
Negative emotion
Carrier targets
English barriers
Others Material
resources
Diversity issues Precision 0.832 0.900 0.928 0.948 0.788 0.905 0.919 Recall 0.775 0.923 0.933 0.949 0.792 0.892 0.922 F1 0.788 0.910 0.921 0.944 0.776 0.895 0.914
Table2showed the performance of the attention-based biLSTMs method on each label using label-based metrics We can see that the attentive biLSTM
model yielded quite high scores Most labels such as Negative Emotion, English
Barriers, Carrier Targets, and Diversity Issues got more than 90% in the F1
score Material Resources label got 89.5% in the F1 score For the remaining two labels, Heavy Study Load and Others, the proposed method achieved around 78%
in the F1 score This result is quite promising due to the ambiguity problem in predicting these labels Observing their samples in the dataset, we saw that these samples have a large overlap with the remaining labels The model, therefore, is easy to make mistakes in prediction
5 Conclusion
This paper presented a new approach to the task of determining students’ learn-ing experiences on social media The previous systems still relied on traditional methods with manually-designed features Building these features takes time and experts knowledge One more challenge is that not all words in one post have the same important weight to the final prediction of the model There-fore, this paper proposed an attention-based biLSTMs to solve these problems This model utilizes neural attention mechanism with biLSTMs to automati-cally extract and capture the most critical semantic features in students’ posts
We perform experiments on a Vietnamese benchmark dataset and experimental results express that the model achieves SOTA performance on this task for Viet-namese The proposed method improves the performance by a large margin of 4% in terms of F1-micro score It achieved 63.5% in the micro-average F1 score, and 59.7% in the macro-average F1 score