3 2.1.1 Early warning systems to predict students online learning performance 3 2.1.2 Developing an early-warning system for spotting at-risk students by using eBook interaction logs.. 1
Trang 1HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
—o0o—
STU 1: NGUYEN QUANG MANH - 1652366
HO CHI MINH CITY, 2022
Trang 5INSURANCEOur team hereby guarantees that this graduation thesis is performed by our own teamand not copied All the materials we have used in our thesis all have their name, sourceand presented in ”REFERENCES”.
Trang 6ACKNOWLEDGEMENTSFirst, we would like to express our gratitude to Dr Vo Thi Ngoc Chau Thanks to hisknowledge and experience that we were able to overcome the difficulties of this thesis Hewas the one to guide through the thesis and show us our mistakes during the planningand implementation as well as provide solutions to our problem.
Next, we would like to send our gratitude to the teachers, professors and staffs in HoChi Minh City University of Technology, especially the ones in Computer Science andComputer Engineering department for teaching new knowledge to us and supporting usthrough our time in the university Thanks to the knowledge which you have taught usthat we were able to finish this thesis
We also would like to thank our parents, our friends who not only supported us ically through the time of this thesis but also mentally Thank you for being there when
phys-we needed
Finally, we wish everyone the best The time we have had here at Ho Chi Minh CityUniversity of Technology will always remain one of the most beautiful memories in ourheart Once again, we would like to thanks everyone for your support
Trang 71.1 Project introduction 1
1.2 Why I choose this topic 1
1.3 Objectives and content 1
1.3.1 Objectives 1
1.3.2 Content 2
1.4 Boundary of the project 2
1.5 Structure of the thesis 2
2 Related Works 3 2.1 Survey similar systems 3
2.1.1 Early warning systems to predict students online learning performance 3 2.1.2 Developing an early-warning system for spotting at-risk students by using eBook interaction logs 3
2.2 Survey system development 3
2.2.1 Front-end framework 3
2.2.2 Back-end framework 4
2.3 Survey other related technologies 5
2.3.1 Introduction 5
3 Project analysis 7 3.1 Requirements 7
3.1.1 Educators 7
3.1.2 Students 8
3.1.3 Parents 8
3.2 Use-case Diagram 8
3.2.1 Educators 9
3.2.2 Students 10
3.2.3 Parents 11
3.3 Activity Diagram 11
3.3.1 Activity Diagram For User Login 11
3.3.2 Activity Diagram for Educator to compare between student’s indi-cators 12
3.3.3 Activity Diagram for Educator to confirm students who are at risk 13 3.4 System Architecture 14
3.5 Database Design 15
3.5.1 Requirements 15
3.5.2 Specify Entity Types 16
Trang 83.6 Data Mining Techniques 17
3.6.1 Logistic Regression 18
3.6.2 Random Forest 18
4 System design 20 4.1 Application architecture 21
4.2 Database design 23
4.3 Prepare Data 32
4.4 Data Mining 35
4.4.1 Collect data for applying data mining 35
4.4.2 Define Problem 35
4.4.3 Evaluate data mining algorithms using scikit-learn library 35
5 Application implementation (Demo version) 37 5.1 Website 37
5.1.1 Back-end 37
5.1.2 User Interface 37
5.1.3 Improve Website Performance 48
6 Conclusion 50 6.1 Accomplished Result 50
6.2 Limitation 50
6.3 Future work 50
Trang 9List of Tables
4.1 Data Mining Evaluation 354.2 Confusion Matrix from Random Forest 35A.1 Test cases table for the website 54
Trang 10List of Figures
3.1 Educator Use-case Diagram 9
3.2 Student Use-case Diagram 10
3.3 Parents Use-case Diagram 11
3.4 Activity Diagram for Login 12
3.5 Activity Diagram for Educator to compare between student’s indicators 13
3.6 Activity Diagram for Educator to confirm students who are at risk 14
3.7 System Architecture 15
3.8 Entity Relationship Diagram 17
3.9 Logistic Regression Implementation Using Scikit-learn 18
3.10 Logistic Regression Result 18
3.11 Random Forest Implementation Using Scikit-learn 19
3.12 Random Forest Result 19
4.1 Website architecture 22
4.2 Student Information 23
4.3 Student Register 24
4.4 Student Interaction 24
4.5 Student Assessments 25
4.6 Assessments 26
4.7 Student Assessments 26
4.8 Materials 27
4.9 Course Information 28
4.10 Warning 29
4.11 User Account 29
4.12 Messages 30
4.13 Educator 31
4.14 Parents 32
4.15 Create Data Sample 33
4.16 Add Visit Number for Interaction of Each Student 33
4.17 Arrange courses for educators 34
5.1 Login Page 38
5.2 Course List of Educator 39
5.3 Course Details of Educator 40
5.4 Materials and Assessments in Courses of Educator 40
5.5 Student Detail of Educator 41
5.6 Educator Dashboard 41
5.7 Course List of Student 42
5.8 Course Details of Student 42
5.9 Student Assessments 43
5.10 Student Profile 43
Trang 115.11 Student Warning 44
5.12 Course List of Child 45
5.13 Parents Warning 46
5.14 Message 47
5.15 Settings 47
5.16 Before Improvement 48
5.17 After Improvement 49
Trang 12Chapter 1
Introduction
In the context of Covid-19, many schools and universities have to change the form
of traditional learning to online learning However, the form of online learning still hasmany doubts and inadequacies Many problems in the online learning process still need
to be improved, for example, it is difficult for teachers to keep track of students’ learningprogress or poor interaction between teachers and students Therefore, to improve thequality of education, an early warning system can help to identify at-risk students, orpredict student learning performance by analyzing learning portfolios recorded in a learn-ing management system In this project, an early warning system will be developed thatcollects learning activities of a course on an e-learning system and applies data miningtechniques
Online learning is becoming more and more popular and has many significant benefitscompared to traditional learning However, the education sector has not been able totake full advantage of those benefits As a student who has experienced online learning,understands its advantages and disadvantages, I want to create a product to contribute
to improve the quality of online learning, help teachers, learners and stakeholders canreceive more what online learning offer
1.3.1 Objectives
In this project, a website is built to help educators to manage all their courses andmonitor the learning progress of their students Furthermore, educators can organizeinterventions that improve the poor performance of some students predicted by a datamining model Students and parents are also users of the system Therefore, the web-site will show the performance of each student and help them feel most comfortable ininteracting with the teachers
Trang 131.3.2 Content
In order to archive the objectives of this project, here are the tasks we need to plish:
accom-• Research current educational warning systems
• Design system architecture
• Research and learn front-end framework to build front-end website
• Research and learn back-end framework and Database Server to build back-endwebsite
• Research and learn data mining algorithms that are related to an early warningsystem and integrate them with the system
• Research technologies related to our project (extended libraries or frameworks port the system)
In this project, the data mining model focus on predicting student performance at thecourse level
• Chapter 1: Introduction
In this chapter, I introduce about the overview of the project
• Chapter 2: Related work
This chapter show what I have to do before building the website for my project
• Chapter 3: Project analysis
I analyse the requirement for this project
• Chapter 4: System design
I design the architecture of the application
• Chapter 5: Implementation
This chapter shows how I implement the project into a website based on the designedsystem
• Chapter 6: Evaluation and Conclusion
This chapter shows what I have completed on the project, its result and futuredevelopment plan
Trang 14Chapter 2
Related Works
2.1.1 Early warning systems to predict students online learning
2.1.2 Developing an early-warning system for spotting at-risk
students by using eBook interaction logs
The article2 found that the models with transformed data produced the lowest mance in all datasets On the other hand, models with categorical data showed betterperformance than models used transformed (continues) data only
2.2.1 Front-end framework
For front-end developing, I research 3 popular developing frameworks and how wellthey perform against each other, provide a structure to help judge front-end JavaScriptframeworks in general and how can I choose the best fit for my project The three frame-works are: React3, Vue4, Angular5 They are all highly popular JavaScript libraries andframeworks that help developers build complex, reactive and modern user interfaces forthe web
Because the rendering pages will happen a lot in my project That means the formance and the frameworks size of the framework is the thing I need to base on whenchoosing the front-end framework
Trang 15Angular uses a real Document Object Model (DOM), therefore it’s best suited forthe single-page-applications where content is updated from time to time It makes theprocess of updating much slower and in case of losing the flow, it will take a lot of time
to find out the issue Thankfully to the two-way data binding process, all the changesmade in the Model are replicated into the views in a secure and efficient way Due to thewide range of features available, the application is much heavier (approximately 500KB)
in comparison to Vue and React that slows down the performance a little
Contrary to Angular, React uses a virtual DOM that enhances the performance of sizes applications that need regular content updates Single-direction data allows bettercontrol over the project The disadvantage might be the need of developers to constantlyupgrade their skills as to the constantly evolving nature of React As React doesn’tprovide a wide range of libraries, its size is much smaller than the size of Angular (ap-proximately 100KB)
all-Vue also uses a virtual DOM, so the changes within a project are made without fecting the DOM properly Vue possesses the smallest size of the three (approximately80KB ) which significantly speeds up its performance
af-Next thing we need to consider is the popularity of the 3 frameworks and how plex to learn the frameworks.According to Stack Overflow Developer Survey Results 2019,React is the most loved by developers (74,5%) followed by Vue.js (73,6%) and only thenAngular.js (57,6%)6 Because of React’s popularity, finding input components and ready-to-use elements is extremely easy They’re all just a Google or GitHub search away.After the research, we feel that React would be the most fit for our project, since ithas:
com-• The use of reusable, composable, and stateful components
• In a browser, we need to regenerate the HTML views in the DOM With React,
we do not need to worry about how to reflect these changes, or even manage when
to take changes to the browser; React will simply react to the state changes andautomatically update the DOM when needed
• ReactJS is SEO friendly
• React JS comes with helpful developer toolset
For back-end develop, I have researched about Python frameworks that are popular,
as they provide more management After research, I find that there are multiple Pythonframework that can help develop the backend So based on my knowledge, testing andreviewing, I minimize our options to 2 frameworks: Django and Flask
development and clean, pragmatic design Built by experienced developers, it takescare of much of the hassle of web development, so you can focus on writing yourapp without needing to reinvent the wheel It’s free and open source
Advantages:
• Ridiculously fast: Django was designed to help developers take applicationsfrom concept to completion as quickly as possible
Trang 16• Reassuringly secure: Django takes security seriously and helps developers avoidmany common security mistakes.
• Exceedingly scalable: Some of the busiest sites on the web leverage Django’sability to quickly and flexibly scale
2 Flask: Flask8 is a micro web framework written in Python
Flask is a small and lightweight Python web framework that provides usefultools and features that make creating web applications in Python easier It givesdevelopers flexibility and is a more accessible framework for new developers sinceyou can build a web application quickly using only a single Python file
Advantages:
• Lots of resources available: Flask is arguably one of Python’s most popularweb frameworks, with plenty of tutorials and libraries available to add to yourapps
• Simplicity: As a minimalistic framework, Flask provides the necessary tools toeasily and quickly build a web app prototype after installation
• Easy Database Integration: Integration with database toolkits like SQLAlchemy,SQL databases like SQLite and MySQL, and NoSQL databases like DynamoDBand MongoDB is relatively easy
• Flexibility: Flask provides developers generous flexibility for developing theirweb applications
In the end, we see that Flask is the most suitable framework for building back-endbecause:
• The official documentation is very thorough, providing lots of details with ten examples and clear tutorials
wellwrit-• Flask is a lightweight framework with few dependencies It takes just a few lines ofPython to load Flask, and because it is modular, you can restrict the dependencies
to suit your needs
2.3.1 Introduction
1 Scikit-learn:
Scikit-learn9 is an open source machine learning library that supports supervisedand unsupervised learning It also provides various tools for model fitting, datapreprocessing, model selection, model evaluation, and many other utilities
2 MySQL:
relational database stores data in separate tables rather than putting all the data
in one big storeroom The logical model, with objects such as databases, tables,views, rows, and columns, offers a flexible programming environment You set up
8 https://flask.palletsprojects.com/en/2.0.x/
9 https://scikit-learn.org/
10 http://www.mysql.com/
Trang 17rules governing the relationships between different data fields, such as one-to-one,one-to-many, unique, required or optional, and “pointers” between different tables.MySQL Server has built-in support for SQL statements to check, optimize, andrepair tables.
Trang 18Chapter 3
Project analysis
3.1.1 Educators
• Educator will log in by his or her unique email as teacher role
• Educator shall view overall statistic of all courses that he or she is responsible for
• Educator can view information of a course which he or she is teaching by choosingthe desired course
• Educator shall see overall statistics of all students in each course
• Educator will view information of each student in the course
• Educator can discuss with students and parents
• Educator will see at-risk student list which the system predicted
• Educator will request feedback from the student who is predicted to be at risk forconfirmation
• Educator will receive feedback after at-risk students reply
• Educator will confirm whether a student is at risk or not The system will inform
to the student then
• Educator will view all indicators of a student in a course by choosing the studentwho the educator need to follow
• Educator will view a indicator of the chosen student in more detail
• Educator can compare a indicator of a student with other students
• Educator shall intervene students who are at risk by providing them support forimprovement
• Educator will specify which actions for each student who need support
• Educator can warn student’s parents in case of necessity
• Educator can log out his/her account
Trang 193.1.2 Students
• Student will log in by his or her unique email as student role
• Student shall view overall statistic of all courses that he or she registered
• Student shall view information of a course which he or she enrolled by choosing thedesired course
• Student can discuss with educator for more help
• Student will view his/her all indicators in a course
• Student can view a indicator in detail
• Student can compare a indicator with the best one, the worst one or average one ofother students
• Student will receive notification in case the student is predicted to be at risk
• At-risk student will give feedback to educator
• Student will receive support from educator in case the student is confirmed to be
at risk
• Student will log out his/her account
3.1.3 Parents
• Parents will log in by his or her unique email as student role
• Parents shall view overall statistic of all courses that their child registered
• Parents shall view information of a course which their child enrolled by choosingthe desired course
• Parents will receive warning in case their child is confirmed to be at risk
• Parents can discuss with educator of each course for more information
• Parents will view all indicators of their child in each course
• Parents can view a indicator of their child in more detail
• Parents can compare a indicator with the best one, the worst one or average one ofother students
• Parents will log out his/her account
From requirements collected, the use-case diagram for each user object in the system
is designed as follows:
Trang 203.2.1 Educators
Figure 3.1: Educator Use-case Diagram
Trang 213.2.2 Students
Figure 3.2: Student Use-case Diagram
Trang 223.2.3 Parents
Figure 3.3: Parents Use-case Diagram
3.3.1 Activity Diagram For User Login
At the landing page, after choosing the role for the registered account, user can login
by typing their email and correct password The system will authorize their account If it
is invalid account, the system will show error messages (for example: incorrect password),then the user should try to login again If it is valid account, the system will redirect tothe page that display all courses of user
Trang 23Figure 3.4: Activity Diagram for Login
3.3.2 Activity Diagram for Educator to compare between
stu-dent’s indicators
For each course which educator want to view the learning progress of a student Withstarting point is the page that display the student list of the course, educator shouldchoose the student for observing Then the system shall display all indicators of thestudent After that, educator can choose a indicator Then choose a object to compare(average, the best or the worst in the course for the indicator) before the system showthe comparison
Trang 24Figure 3.5: Activity Diagram for Educator to compare between student’s indicators
3.3.3 Activity Diagram for Educator to confirm students who
are at risk
At starting point, the system display the at risk student list Educator will choose astudent to view The system display a note list which show messages between educatorand student, educator can view feedback from student here Educator can add messages
to ask student for conforming the performance before deciding to conclude if student is
at risk or not
Trang 25Figure 3.6: Activity Diagram for Educator to confirm students who are at risk
After research relevant articles and technologies, I decide to design the architecture ofthe system as follows:
Trang 26Figure 3.7: System Architecture
• A class/course has unique name, credits
• A student has student id, name, date of birth, sex, place of birth, address, email,phone number
• An educator has name, date of birth, sex, place of birth, address, email, phonenumber
• Parents who are responsible for each student have phone number, name and job
• It is required to keep track of the performance (Indicators) of each student in eachcourse It includes time each student access to the e-learning, quiz grades, assign-ment grades, class attendance, and discussion in forum of e-learning website
Trang 27• Warnings will be sent to students who are predicted to be at risk for confirmation.Then it has state (that confirm if the student is really at risk or not), notes fromeducator and feedback from the student.
3.5.2 Specify Entity Types
• An entity type COURSE with attributes: Name, manager Name is the key tribute
at-• An entity type STUDENT with attributes: studentID, name, date of birth, sex,place of birth, address, email, phone number studentID, email, phone number are(separate) key attributes because each was specified to be unique
• An entity type EDUCATOR with attributes: name, date of birth, sex, place ofbirth, address, email, phone number Email, phone number are key attributes
• An entity type PARENTS with attributes: name, phone number and job Phone number
is the key attribute because it is unique
• An entity type WARNING with attributes: State, Notes from educators, Feedback
• An entity type INDICATOR with attributes: access time, quiz, class attendance,discussion, assignment
3.5.3 Specify Relationship Types
• Manage: a one-to-one relationship type between Educator and Course Educatorparticipation is partial A course must have a manager at all time then Courseparticipation is total
• Teach: a one-to-many relationship type between Educator and Course Both ticipations are total
par-• Dependent of: a one-to-one relationship type between Student and Parents Bothparticipations are total
• Warn: a one-to-many relationship type between Course and Warning A coursemay have many warnings depends on the number of at-risk students
• Feedback: a one-to-one relationship type between Student and Warning Only risk student receive warning then Student participation is partial A warning (ifhave) must be replied by student who received it
at-• Perform: a one-to-one relationship type between Student and Indicators EachStudent has a performance (indicators) in each course Both participations aretotal
• Show: a one-to-many relationship type between Course and Indicators A coursewill show all indicators of all students Both participations are total
Trang 283.5.4 Entity Relationship Diagram
From the above analysis, I draw the ERD for the database design as follows:
Figure 3.8: Entity Relationship Diagram
This project will apply data mining techniques for predicting student performance.This project also uses Support Vector Machine with different parameter to improve themodel The evaluation and comparison for these algorithms will be shown in next chap-ters
But firstly, I just test for small data set first: logistic regression and random forest for aeducational data set using Scikit-learn library
Trang 293.6.1 Logistic Regression
I add comments for explaining step by step
Figure 3.9: Logistic Regression Implementation Using Scikit-learn
The result show:
Figure 3.10: Logistic Regression Result
Trang 30Figure 3.11: Random Forest Implementation Using Scikit-learnThe result show:
Figure 3.12: Random Forest Result
Trang 31Chapter 4
System design
Trang 324.1 Application architecture
Our website will contain 2 main parts:
• Front-end application for user
To build the front-end of the website, I use the following dependencies: