The middle term essay introduction to machine learning machine learning’s problems

In chapter 2, we will practice solving a classification problem based on 3 different models Naive Bayes, k-Nearest Neighbor, and Decision Tree.. In chapter 4, we will show a theory, the

Trang 1

TON DUC THANG UNIVERSITY FACULTY OF INFORMATION TECHNOLOGY

THE MIDDLE-TERM ESSAY INTRODUCTION TO MACHINE LEARNING

MACHINE LEARNING’S PROBLEMS

Instructors: MR.LE ANH CUONG Student: LE QUANG DUY– 520H0529

TRAN QUOC HUY - 520H0647

Class: 20H50204 Course: 24

HO CHI MINH CITY, 2022

Trang 2

VIETNAM GENERAL CONFEDERATION OF LABOR

TON DUC THANG UNIVERSITY FACULTY OF INFORMATION TECHNOLOGY

THE MIDDLE-TERM ESSAY

INTRODUCTION TO MACHINE LEARNING

MACHINE LEARNING’S PROBLEMS

Instructors: MR.LE ANH CUONG Student: LE QUANG DUY– 520H0529

TRAN QUOC HUY - 520H0647

Class: 20H50204 Course: 24

HO CHI MINH CITY, 2022

Trang 4

MIDDLE-TERM ESSAY COMPLETED AT TON DUC THANG

UNIVERSITY

I hereby declare that this is my own report and is under the guidance of Mr Le Anh Cuong The research contents and results on this topic are honest and have not been published in any form before The data in the tables for analysis, comments, and evaluation are collected by the author himself from different sources, clearly stated in the reference section.

In addition, the project also uses a number of comments, assessments as well as data from other authors, other agencies, and organizations, with citations and source annotations.

If I find any fraud I take full responsibility for the content of my report Ton

Duc Thang University is not related to copyright and copyright violations caused by me during the implementation process (if any).

Ho Chi Minh City, 16 October 2022

Author

Le Quang Duy

Trang 5

TEACHER’S CONFIRMATION AND ASSESSMENT SECTION

Confirmation section of the instructors

_

Ho Chi Minh City, day month year (sign and write full name)

The evaluation part of the lecturer marks the report

_

Ho Chi Minh City, day month year (sign and write full name)

Trang 6

In this report, we will discuss basic methods for machine learning.

In chapter 2, we will practice solving a classification problem based on 3 different models (Naive Bayes, k-Nearest Neighbor, and Decision Tree) And compare these models based

on metrics: accuracy, precision, recall, f1-score for each class, and weighted average of f1-score for all the data

In chapter 3, we will discuss, work on, and visualize the Feature Selection problem, and the way it (“correlation”) works

In chapter 4, we will show a theory, the code implementation, and the code’s illustration for 2 algorithms of optimization (Stochastic Gradient Descent and Adam Optimization Algorithm)

Trang 7

TABLE OF CONTENTS

Trang 8

LIST OF ABBREVIATIONS

Trang 9

LIST OF DIAGRAMS, CHARTS, AND TABLES

Trang 10

CHAPTER 1: INTRODUCE

In this report, we divided into 3 parts to solve 3 problems with 4 chapters

_In chapter 1, we will introduce the outline of the report

_In chapter 2, we will show 3 models which are: Naive Bayes Classification, k-Nearest Neighbors, and Decision Tree With each model, we will do a common preparation before training and testing After all,

we split data into 2 types: training (75%) and testing (25%), and make a comparison among standards: accuracy, precision, recall, f1 - score, and weighted average of f1-score

_In chapter 3, we will answer 2 questions: what it is and how it works it means we will show a theory about “correlation” in feature selection and solve Boston-house-pricing regression

_In chapter 4, we will show the theory of Adam and the Stochastic Gradient Descent Algorithm and show our code for each algorithm

Trang 11

CHAPTER 2: PROBLEM 1

2.1 Common Preparing for 3 models:

_In this chapter, I will solve the problem with 3 models: Naive Bayes, k - Nearest Neighbors, and Decision Tree.

_We used the “iris” data set to visualize 3 of the models.

_First of all, we prepare for collecting data and reading file “iris.data”

Trang 12

_Result after reading the file (only 10 headlines):

_Space of data:

_Performing data before training:

Trang 13

Description: We chose 149 rows of 4 first columns and split them into 75% for training and 25% percent for testing through a group of variables: x_train, x_test, y_train, y_test 2.2 Execute models:

2.2.1 Naive Bayes model:

_ Training time: Take less than 1 second to train data.

_ Predicting time: Take less than 1 second to train data.

_ Checking error:

_ Result after checking:

Trang 14

Conclusion: We only found 3 errors after running this model

2.2.2 k-Nearest Neighbors model:

_ Checking error:

Trang 15

2.2.3 Decision Tree model:

_ Checking error:

Trang 16

2.3 Comparing:

2.3.1 Reporting from Naive Bayes Model:

Conclusion: Weighted f1-score of data: 92%

2.3.2 Reporting from k-Nearest Neighbors Model:

Trang 17

2.3.3 Reporting from Decision Tree Model:

Trang 18

3.2 How it works to help?[1]

High correlation features are more linearly dependent and hence virtually equally affect the dependent variable We can thus exclude one of the two features when there is a substantial correlation between the two features

For example: We used pricing which existed in scikit-learn library to analyze, Boston pricing:

house-After loading data, we have:

We divided data into 2 sets: training (70%) and testing (30%):

Trang 19

We used “heatmap” to visualize data:

As we can see, the number in each square is the percent that they correlate together so we can reject 1 in 2

of them In this instance, “tax” column with “rad” row is up to 0.91 which means relative up to 91% so

we can remove one of them from the data set Thresholds are often used which are 70% to 90%

In this situation, we used 70% for the threshold to reject attributes unnecessary

Our function of correlation:

Trang 20

We return a set of names of rejecting and prepare for our dataset:

After rejecting, we lost 3 attributes with only 10 columns (13 columns before):

3.3 Solving linear regression’s problem:

After all, we solve this problem with linear regression:

Predicting values:

Trang 21

Checking MAE test score:

Trang 22

Stochastic Gradient Descent is especially useful when there are redundancies in the data.

As we can see, we have to draw a line as linear and we have a formula to predict a height:

Predicted Height = Intercept + Slope x Weight (1)

In this instance, we can see 3 clusters, and we can choose randomly intercept = 0 and slope = 1

As we knew, we can use a “Loss Function” to determine how it fit the data:

Trang 23

Sum of squared residuals = (Observed Height - Predicted Height) (2) 2

Replace (1) into (2), we have:

We have to calculate the derivative of the sum of squared residuals with respect to the intercept and slope:

We can pick randomly 1 sample to calculate the derivative:

f(Sum of squared residuals) intercept ’ = -2 (3.3 - (0 +1 x 3)) = -0.6

f(Sum of squared residuals) slope ’ = -2 x 3 (3.3 - (0 + 1 x 3 )) = -1.8

We can easily calculate the step size to improve the line:

Step size intercept = f(Sum of squared residuals) intercept ’ x learning rate Step size slope = f(Sum of squared residuals) slope ’ x learning rate

We have to start with large Learning Rate and make it smaller with each step

In this example, we chose 0.01 for learning rate:

Step size intercept = f(Sum of squared residuals) intercept ’ x learning rate = -0.6 x 0.01 = -0.006

Step size slope = f(Sum of squared residuals) slope ’ x learning rate = -1.8 x 0.01 = -0.018

=> New intercept = Old intercept - Step size intercept = 0 - (0.006) = 0.006

Trang 24

New slope = Old slope - Step size slope = 1 - (-0.018) = 1.018

We had a new line:

We iterate from pick another random sample to calculate until Loss Function is less than 0.001, we can stop:

Trang 25

Having a new line:

Repeating step above and having a new line:

Trang 26

And we can stop at intercept = 0.85 and slope = 0.68 in this instance:

When we have a new sample added, we use this sample and repeat each step before to create a new line which fits with data:

Result after adding new data point:

Trang 27

4.1.2 Show code:

Trang 28

4.2 Adam Optimization Algorithm

4.2.1 Theory:

Adam Optimization Algorithm also known as Adaptive Moment Estimation is a method for stochastic optimization It’s a kind of gradient descent optimization for machine learning (neural networks, etc), ADAM is a method created to improve the learning rate for machine learning

Adam is a stochastic objective function optimization algorithm based on first-order gradients and adaptiveestimation of low-order moments It’s a very efficient method when only first-order gradients are requiredwith low memory This method is also suitable for problems with unstable variability and fragmented training data

Pseudo code for Adam Algorithm:

Trang 29

Note that we can improve above algorithm by changing the order of computation: Ê mấy file iris nữa Mấy file data m xài nữa Duy

Trang 30

4.2.2 Show code:

Trang 31

[1]: https://www.kaggle.com/code/bbloggsbott/feature-selection-correlation-and-p-value/notebook[2]: https://www.phamduytung.com/blog/2021-01-15 -adabelief-optimizer/#:~:text=Adam%20%2D

Tiêu đề	The Middle Term Essay Introduction to Machine Learning: Machine Learning’s Problems
Tác giả	Le Quang Duy, Tran Quoc Huy
Người hướng dẫn	Mr. Le Anh Cuong
Trường học	Ton Duc Thang University
Chuyên ngành	Information Technology
Thể loại	essay
Năm xuất bản	2022
Thành phố	Ho Chi Minh City

Định dạng
Số trang	31
Dung lượng	2,02 MB