How to Forecast the Students’ Learning Outcomes Based on Factors of Interactive Activities in a Blended Learning Course Minh-Duc Le VNU University of Engineering and Technology, Vietnam
Trang 1How to Forecast the Students’ Learning Outcomes Based on Factors of Interactive Activities in a Blended Learning Course
Minh-Duc Le
VNU University of
Engineering and
Technology, Vietnam
National University Hanoi,
144 Xuan Thuy, Cau Giay,
Hanoi, Vietnam
duclm@vnu.edu.vn
Hoa-Huy Nguyen
VNU Center for Education Accreditation Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
huynguyen@vnu.edu.v
n
Duc-Loc Nguyen
VNU University of Engineering and Technology, Vietnam National University Hanoi,
144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
17020871@vnu.edu.vn
VNU University of Engineering and Technology, Vietnam National University Hanoi,
144 Xuan Thuy, Cau Giay, Hanoi, Vietnam
vietanh@vnu.edu.vn
ABSTRACT
This paper summarizes the research results of identifying the
influencing factors in the online learning phase of a blended
learning course From such factors, we propose a model for
predicting student outcomes In our study, we have conducted
several models in order to predict the student’s learning outcomes,
using a course of 231 participants Obtained data from the logs file
of an LMS system is analyzed using learning analytics and machine
learning techniques, and the results propose that the four factors are
the number of views, the number of posts, the number of forum
views, and the number of on-time submitted assignments impact on
the student’s learning outcomes For the forecast of the final exam
grade based on the results of the formative assessment tests,
Bayesian Ridge is the most accurate among the four conducted
models (Linear Regression, KNR, SVM, Bayesian Ridge) Our
study can be a useful material for lecturers and course designers in
effectively organizing blended learning courses
CCS Concepts
•Applied computing➝Education➝E-learning
In general, when students participate in online learning activities, they often encounter a major problem regarding the lack of tools that support them in course recommendations and forecast results Along with the strong development of machine learning and data mining techniques, Learning Analytics (LA) is a crucial topic in the educational research community [3] Overall, analytics learning technology consists of five steps: data collection, reporting, forecasting, action, and improvement [4] Thanks to the application
of LA techniques, data analysis of learners when participating in online activities is tremendously facilitated [5] The results of this analysis not only provide useful information about the learning process and the relationship between learners but also can be used
as the basis for recommendations to support learners to improve their learning results
When learners participate in online learning activities, most of them
do not receive enough support from lecturers [6] Therefore, identifying the factors that influence learners' results when they are participating in online learning activities is significant in building time recommendations to help learners to regulate their learning process
We conduct this study to identify the factors that influence student achievement when taking the online phase of the course, with a special focus on the interactive activities that the learner engaged These factors are the basis for predicting student learning outcomes and can also be considered as useful indicators for lecturers and course designers in designing the courses
1.1 Research Context
University of Engineering and Technology, Vietnam National University, Hanoi has started to apply informative technology to support teaching and learning activities since 2003 At first, our university has applied some LMS / LCMS systems such as Moodle, Blackboard as support tools for these activities In this phase, these tools are used as an additional information channel to assist learners
to access learning resources supplied by the lecturers Recently, there are about 10% of the courses have been implemented under the blended learning course in which most online activities combined with some traditional lessons in each course Taking the courses using this model, learners always play an active role throughout the learning process by interacting within the learning system During the learning process, one of the legitimate learner’s demand is how to effectively participate in these courses In other
Keywords
Blended learning; forecast students’ learning outcomes; learning
analytics; learning activities
Currently, blended learning is a popular trend in modern education
because of its superiority Blended learning is widely understood as
a combination of traditional and online learning [1] It is also is
considered as one of the most suitable solutions for institutions
when they initiate to shift from traditional learning to a new form
[2]
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page Copyrights for
components of this work owned by others than ACM must be honored
Abstracting with credit is permitted To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission
and/or a fee Request permissions from Permissions@acm.org
ICFET 2020, June 5–8, 2020, Tokyo, Japan
© 2020 Association for Computing Machinery
ACM ISBN 978-1-4503-7533-7/20/06…$15.00
https://doi.org/10.1145/3404709.3404711
* The corresponding author
Trang 2words, for the designer and lecturer, what are the necessary actions
in order to create a course that is effective for learners
1.2 Research Questions
In such a context, our research aims to build an effective course
model to support learners To gradually resolve this issue, we
conducted this study to answer the following research questions: (1)
What factors affect students' learning outcomes when participating
in a blended learning model? (2) Is it possible to develop a model
for predicting students’ learning outcomes based on those factors?
1.3 Paper Structure
In the next section, we will present an overview of the recent studies
in identifying the factors that affect learners' results as well as
studies related to the models to predict the students’ learning
outcomes In section 3, we describe in detail the organization of
research implementation, including research subjects, procedures,
data as well as models The results of the study are presented in
detail in Part 4 Some discussions are presented in Section 5 Finally,
Section 6 presents some conclusions and research directions in the
future
2 LITERATURE REVIEW
A learner early identification with unsatisfactory academic results
and timely warnings will help the learners significantly improve
results[7], [8], especially when they are participating in an
e-learning environment in which lacks close supervision compared to
traditional learning Many studies have proposed several models
and methods to predict learners' results in order to support learners
Typically, models apply data mining techniques, machine learning
to figure out problems [9] The dropout student forecast model
based on a set of demographic attributes Research results show that
the logistic regression model with the best results with AUC ranges
from 0.66 to 0.73 [10] Along with the approach of using a logistic
regression model, Marbouti and his colleagues implemented a
predictive model in weeks 2, 4 and 9 of the courses The results of
the study imply that the logistic regression model is reliable in
predicting students at risk [11]
In the blended-learning environment, we can analyze some
information from students gathered from the learning management
system Several approaches using learning analytics to predict
students Lu and colleagues [12] used learning activities from
online and offline learning and proposed 21 variables to predict
student performance Their findings showed that four online factors
impacted student results Predicting student success in blended
learning, Macarini and colleagues had used data from LMS Moodle
what collected student activities based on student interaction The
results showed that factors can predict student success, and detect
student at-risk even in the first week [13]
In the study about the student dropout prediction which is based on
the student's personal characteristics as well as academic
performance [14], the authors used typical machine learning
methods such as Artificial Neural Network, Decision Tree,
Bayesian Networks for data sets from the Open University of China
shows the Decision Tree method is the best Xing [15] and
colleagues conducted a study on 3617 students who took the course
has 11 modules over 8 weeks in a MOOCs system to predict student
dropout by using existing and past features to forecast based on
Bayesian Network and Decision Tree (C4.5) models Research
results show that these factors have a positive impact on the quality
of intervention and support instructors When performing student
dropout prediction studies in the MOOCs system, Fei and
colleagues showed that the long-short term memory (LSTM) model was most effective [16], [17]
In addition, studies focusing on finding factors that impact the students’ learning outcomes are also concerned The results of Khalil's research [18] and colleagues have identified factors such
as course length, student motivation, and interaction during learning as well as background knowledge that have an impact on student dropout rates The study to identify the factors that affect learners' retention in MOOCs shows that the interaction between students and faculty effectively increases retention [19] Learners' interaction with learning systems through click stream is seen as a predictor of academic performance [20] The results of the study indicate that this factor along with the demographic characteristics has a great influence on student performance The interaction between students and lecturers is also considered as a factor to the students' engagement with the course [21] The learners’ sentiments
in forums when participating in the MOOCs system are also factors that affect student performance as well as results Research by Tucker and colleagues have shown a correlation between student’s sentiments and their performance [22]
On the other hand, scores are also considered as the predictors of results as well as student performance, especially the results evaluated under formative assessment can be seen as the basis for predicting final results [23], [24] In order to predict the student’s grade for the average group of students, the regression model should be used, in contrast to predicting student’s performance for each student, the SVM model should be considered for better results [25]
3 METHODS 3.1 Participants and Procedures
The study was conducted with 23 first-year students who first took the blended learning course with the subject topic is “Introduction
of Information Technology”
The course is organized for 12 weeks There are two content types
in this course In offline learning activities, the former is theoretical content which takes place in three weeks in 1st, 4th, and 10th, the latter is practical content which takes place in weeks 2nd, 5th, and
11th All learning content and interactive activities such as online learning, forum, online assignments, are implemented on the Moodle system at the website (https://onlinecourses.uet.vnu.edu.vn)
The results are assessed by formative form It includes three online tests for theoretical content and three exercises that are submitted through the LMS system for practical content The final exam consists of two parts, the theory section consists of 40 questions for
30 minutes of online testing, and the practical section is a computer-based test for 60 minutes
3.2 Data
Data is collected from the system's log databases which are collected when learners start attending the course until the end of the course between September and December 2019 From the data source which stores learners' interactive activities, we have developed a tool as a plugin for the Moodle system to analyze and statistic the learner's interaction activities
The data of each student includes the number of content views, the number of postings, the number of forum posts, the number of forum views, the number of on-time submitted assignments, the number of late submitted assignments, the number of un submitted assignments, online test scores, and final online exam
Trang 33.3 Model
In response to research questions, we conducted a number of
students’ learning outcomes prediction models: (1) To forecast
student’s learning outcomes based on factors that result from the
learners’ interaction with the system, (2) To forecast the final exam
results based on formative assessment tests’ grade
To predict the students’ learning outcomes based on interactive
activities, we build the model based on the Regression machine
learning technique with the R tool In our model, the chosen
activities as factors are independent variables, the predictor
variable is the final grade of the course To select the parameters
that best fit the model, we have performed two main steps are:
selecting the best regression model and selecting the best factors
for a model
In order to forecast the final exam grade based on the results of the
assignment’s results, we tested four predictive models: Linear
Regression, SVR (Epsilon-Support Vector Regression), KNR
(Regression based on KNN), and Bayesian Ridge
4 RESULTS
4.1 Descriptive of the Interactive Learning
Activities
Table 1 describes the descriptive statistics of the interaction of the
learning activities The statistical results showed that each student
interacts with the system to view learning resources 12 times per
week on average A student who has the lowest number of
interactions is more than 4 times per week It is noticeable that
students actively interact with the learning system On average, the
students submitted about two assignments per five assignments that
they need to submit
Table 1 Descriptive statistics the interaction of the learning
activities Variable Min Max Mean Std
Deviations
Number of views 55 340 155.7 67.64
Number of posts 13 58 29.3 13.27
Number of forum
views
Number of forum
posts
Number of
submitted
assignments
Number of late
submitted
assignments
4.2 To Forecast the Student’s Learning
Outcomes Based on Statistics of Interactive
Activities
The appropriate parameters were selected for the model to predict
the final exam results are independent variables: number of views,
number of posts, number of forum views, number of forum posts,
number of submitted assignments, number of late submitted
assignments
Experimenting by using all subsets regression method with six variables to select the best models for different parameters, we obtained several models predicting the score of the final exam: (1) Correlated with a factor, showing the correlation between final exam results and the number of posts The correlation coefficient
of the post parameter is 2.015 means with a 1% increase rate of the number of posts, the student's score increased by 2.015%; (2) Correlated with two factors, showing the correlation between the end of the final exam result with the number of posts and the number of on-time submitted assignments The correlation coefficient of the final score with the post parameter is 2.31, the coefficient of correlation between the final score and the number of on-time submitted assignments is 2.14; (3) Correlated with three factors, the results show a correlation between the test results with the number of posts, the number of forum views, and the number
of on-time submitted assignments; (4) Correlated with four factors, the results showed that a correlation between the final exam result and the number of views, the number of posts, the number of forum visits, and the number of on-time submitted assignments In more detail, the correlation coefficient of the final exam result with the number of exercises submitted on time, the number of posts, the number of times participating in the forum, the number of views is 2.65, 2.31, 1.71, 1.32 respectively
We use the Akaike Information Criterion (AIC) method to make comparisons among models The correlation model between the final exam result and four factors has the smallest AIC value is 171.8, it can be considered as an optimal choice in this case to forecast students’ learning outcomes
4.3 To Forecast the Student’s Learning Outcomes based on the Formative Assessment Results
In this experiment, students take three online assessment tests and one final exam The data in Figure 1 shows that there is a correlation between the results of the tests and the final exam
Figure 1 The correlation between the results of the tests and
the final exam
In this study, we use three functions to evaluate the models: average error average (MAE), mean square error (MSE), determination coefficient (coefficient of determination r2) The specific results of these measures are shown in Table 2
Table 2 Evaluation results of test models Model MSE MAE R2_loss
Linear Regression
1.076808 0.883435 0.893004
KNR(k=20) 1.113869 0.912662 0.953646 Bayesian
Ridge
1.075545 0.883538 0.890447
Trang 4According to the visual assessment, we can see that we should use
the Bayesian Ridge model to predict the result as the MAE error
value of the model is the lowest
However, after looking at the learning curve of the models in Figure
2, although Bayesian Ridge is the most accurate model, its learning
curve shows the overfitting of this model A similar situation is
observed in the Linear Regression model Considering the KNR
model (K = 20), it might be relatively effective when the training
data is not much, but when the training data increases, the accuracy
of the model does not improve
According to the learning curve, we shall appreciate the Support
Vector Regression model in which bias and variance of the
relatively balanced model without overfitting Besides, the
accuracy of the model tends to increase if more data is added
Figure 2 The learning curve of four models
5 DISCUSSION
5.1 The Research Questions’ Results
For reporting results based on statistics of interactive activities:
Considering in the online learning phase, ignoring factors such as
background knowledge and learners’ level, activities that support
learners to interact with the system that affects their learning
outcomes The results obtained from the forecasting models also
confirm that the interaction between learners - content, learners -
learners have a great impact on learning outcomes as previous
studies [20], [21] However, the optimal model selection for
predicting student’s learning outcomes based on these factors is
still a detrimental issue that requires further studies due to the
number of statistical activities is not much in the small dataset
Furthermore, the data collected from these interactive activities
completely depends on whether those can be carried out in the
course or not If a learning activity is not in the course, we cannot
use them as the factors for the predicting model
For forecasting student’s learning outcomes based on formative
assessment results in the course: In order to perform this, the course
needs to deploy many exercises to evaluate periodically After the
students have the assignment results, they can get a forecast value
of the final exam result that is the basis for students to adjust their
learning strategies in the future The results show that we should
use the Bayesian Ridge model in case the data does not have too
much variation or the difference between training data and actual
data is small because its accuracy is highest compared to other
models However, the Support Vector Regression method should
also be considered because if it has more data to train, the accuracy will ameliorate and not over fit training data
5.2 Implication
The course that has a variety of interactive activities can create a more effectively predicting model because we can easily choose more factors for the model Empirical results can suggest that we should design a course with a variety of interactive activities and those need to be implemented regularly in each course
It is possible to use the results of the formative assessment in a course as a criterion to predict the student’s final exam Building models for predicting the final grade based on component scores will help learners to identify the results in the course as an early warning This result may also be a suggestion for the course designers on how to design effective learning activities in the course
6 CONCLUSION
In this research, we have tried to identify the factors that affect the student’s learning outcomes in the blended learning courses and proposed some models using the machine learning techniques to forecast them Findings show that factors related to student-content interaction and student-student interaction are significant Moreover, the test results of the formative assessments are reliable
in order to develop the model for predicting the students' final exam grades The results also show that the Bayesian Ridge method should be applied for this problem Although the research was conducted on a small scale, the results were positive and could be applied in many courses
In further studies, we will develop the models for predicting student learning outcomes based on a combination of component scores in the learning process and online learning activities
7 ACKNOWLEDGMENTS
This work has been supported by Vietnam National University, Hanoi under Project QG.20.57
8 REFERENCES
[1] C J Bonk and C R Graham, “The Handbook of Blended
Learning,” In, 2006
[2] J Kang and G A Seomun, “Evaluating Web-Based Nursing Education’s Effects: A Systematic Review and
Meta-Analysis,” Western Journal of Nursing Research 2018
[3] A Chiappe, “Learning Analytics in 21st century education: a review,” pp 971–991, 2017
[4] J P Campbell, P B DeBlois, and D G Oblinger,
“Academic Analytics,” Educ Rev., vol 42, no October, pp
40–57, 2007
[5] O Viberg, M Hatakka, O Bälter, and A Mavroudi, “The Current Landscape of Learning Analytics in Higher
Education,” Comput Human Behav., vol 89, no October
2017, pp 98–110, 2018
[6] E Nkenke et al., “Acceptance of technology-enhanced
learning for a theoretical radiological science course: A
randomized controlled trial,” BMC Med Educ., 2012
[7] S Aud et al., “The condition of education 2013,” 2013
[8] M Hlosta, Z Zdrahal, and J Zendulka, “Ouroboros : early
identification of at-risk students,” LAK17 - Seventh Int
Learn Anal Knowl Conf., pp 6–15, 2017
Trang 5[9] M Mayilvaganan and D Kalpanadevi, “Comparison of
classification techniques for predicting the performance of
students academic environment,” in 2014 International
Conference on Communication and Network Technologies,
ICCNT 2014, 2015
[10] L Aulck, N Velagapudi, J Blumenstock, and J West,
“Predicting Student Dropout in Higher Education,” Proc
ICML Work #Data4Good Mach Learn Soc Good Appl
New York, NY, USA, 2016
[11] F Marbouti, H Diefes-Dux, and J Strobel, “Building
Course-Specific Regression-based Models to Identify At-risk
Students,” 2015
[12] O H T Lu, A Y Q Huang, J C H Huang, A J Q Lin, H
Ogata, and S J H Yang, “Applying learning analytics for
the early prediction of students’ academic performance in
blended learning,” Educ Technol Soc., vol 21, no 2, pp
220–232, 2018
[13] L A B Macarini, C Cechinel, M F B Machado, V F C
Ramos, and R Munoz, “Predicting students success in
blended learning-Evaluating different interactions inside
learning management systems,” Appl Sci., vol 9, no 24,
2019
[14] M Tan and P Shao, “Prediction of student dropout in
E-learning program through the use of machine E-learning
method,” Int J Emerg Technol Learn., vol 10, no 1, pp
11–17, 2015
[15] W Xing, X Chen, J Stein, and M Marcinkowski,
“Temporal predication of dropouts in MOOCs: Reaching the
low hanging fruit through stacking generalization,” Comput
Human Behav., vol 58, pp 119–129, 2016
[16] M Fei and D Y Yeung, “Temporal Models for Predicting
Student Dropout in Massive Open Online Courses,” Proc -
15th IEEE Int Conf Data Min Work ICDMW 2015, pp
256–263, 2016
[17] Y Jung and J Lee, “Learning Engagement and Persistence
in Massive Open Online Courses (MOOCS),” Comput
Educ., 2018
[18] H Khalil and M Ebner, “MOOCs Completion Rates and Possible Methods to Improve Retention - A Literature
Review,” EdMedia World Conf Educ Media Technol., vol
2014, no 1, pp 1305–1313, 2014
[19] K S Hone and G R El Said, “Exploring the factors
affecting MOOC retention: A survey study,” Comput Educ.,
vol 98, pp 157–168, 2016
[20] H Waheed, S U Hassan, N R Aljohani, J Hardman, S Alelyani, and R Nawaz, “Predicting academic performance
of students from VLE big data using deep learning models,”
Comput Human Behav., vol 104, no November 2018, p
106189, 2020
[21] S S Jaggars and D Xu, “How do online course design
features influence student performance?,” Comput Educ.,
2016
[22] C Tucker, B K Pursel, and A Divinsky, “Mining student-generated textual data in MOOCs and quantifying their effects on student performance and learning outcomes,”
Comput Educ J., vol 5, no 4, pp 84–95, 2014
[23] Q Jin, P K Imbrie, J J J Lin, and X Chen, “A multi-outcome hybrid model for predicting student success in
engineering,” in ASEE Annual Conference and Exposition,
Conference Proceedings, 2011
[24] F Marbouti, H A Diefes-dux, and K Madhavan,
“Computers & Education Models for early prediction of at-risk students in a course using standards-based grading,”
Comput Educ., vol 103, pp 1–15, 2016
[25] S Huang and N Fang, “Ac 2010-190: Regression Models for Predicting Student Academic Performance in an
Engineering Dynamics Course,” Age, 2010