Project report subject probability and statistics

Fitting linear regression models: we want to explore what factors may affect the final grade.. Proposed method L Import data Load packages librarytibble #Data frame library MASS #Modern

Trang 1

VIET NAM NATIONAL UNIVERSITY

Class: CC06 Group: 3 Subject code: MT2013

Trang 2

i10 0 28

Trang 3

A Problem

This dataset comes from research by TR/Selcuk University Mechanical Engineering department The aim of the study is to determine how much of the adjustment parameters in 3d printers affect the print quality, accuracy and strenght Where there are nine setting parameters and three measured output parameters

b Descriptive statistics for each of the variables

c Graph: hist, boxplot, pairs

3 Fitting linear regression models: we want to explore what factors may affect the final grade

4 Predictions

B Proposed method

L Import data

Load packages

library(tibble) #Data frame

library (MASS) #Modern Applied Statistics with S

library(ggplot2) #Data Visualization

library(lattice) #Data Visualization

library (caret) #Cross Validation

library (e1071) #SVM

Read data

#Read Data

Trang 4

o Explain: We used function read.csv to import data

o Result: we have data frame that include 50 rows and 13 columns

~~ walill_thickness infill_density nozzle_temperature bed_temperature print_speed fan_speed roughness tension_strenght

> str(home, list len=9)

"đata.frame': 50 obs of 14 variables:

$ tension strenght : int 18 16 810 5 24 1214 27 25

(list output truncated)

> summary (home[,1:9])

Kis wall_thickness infill_density nozzle_temperature bed_temperature print_speed fan_speed roughness

min 70.020 Min : 1.00 in :10.0 Min 7200.0 win :60 Min : 40 Min : 0 Min, $ 21:0

median :0,100 5.00 ‹0 Median :220.0 Median :70 Median: 60 Median: 50 Median :165.5 Mean :0.106 5.22 4 Mean :221.5 mean = :70 Men : 64 Mean : 50 Mean :170.6 3rd Qu :0.150 7.00 +O 3rd Qu :230.0 3rd Qu :75 3rd Qu.: 60 3rd Qu.: 75 3rd Qu.:239.2 Max :0 200 :10.00 ‹0 Max, :250.0 Max :80 Max ¡120 Max :Ở100 Max :368.0

Trang 5

II Data Visualization

1 Transformation:

- Input:

erature,print_speed,fan_ speed,roughness,tension_ strenght))

Explain: We used function subset() to filter the data corresponding to the variables mentioned in part A

wail_thickness infill_density nozzie_temperature bed_temperature print_speed fan_speed roughness tension_strenght

Trang 6

bed_temperature<- home$bed_temper ature

deviation of the column

Trang 7

> bed_temperature<- homeSbed_temperature

> summary (bed_temper ature)

# Give the chart file a name

png(file = “bar wall_thickness png”)

Trang 8

" TableQ/length() function are is to calculate the percentage of each type

Trang 10

3 Graph:

For each column represent for each numeric variable, we draw 4 type of graph barplot, boxplot, histogram, pair diagram and a linear regression model between the variable and price

a Bar plot

and create barplots with the barplot ( ) function

# Bar Plot

counts <- table(wall_thickness)

png(file = “bar wall_thickness png”)

Trang 13

png(file = "boxplot wall_thickness png”)

Trang 14

Result:

wall_thickness

Trang 17

a

¬ 8+

2

84

7 8¬ { '

Trang 18

png(file = “hist wall_thickness.png")

Trang 21

tension_stren,

4.0 3.0

25

20 1.5

10

Trang 22

- Input:

#Find the correlation between the different variables

round(cor Chome) ,2)|

#Perform the linear regression analysis

#Create linear regression model

variables respect to X_Perimeter and Y_Perimeter

o And use function summary() to show all the parameter for this model

Steel_Plate_Thickness 0.11 -0.21 1.00 -0.15 -0.06 0.21

Y_Perimeter -0.09 0.02 -0.06 0.91 1.00 -0.06 Length_of_Cconveyer 0.30 -0.05 0.21 -0.13 -0.06 1.00

Then we saw that the variable “Y Perimeter” effect to X_ Perimeter the most because the coefficient of determination is 0.91 (near 1) While 4 variables

only get -0.19, 0.02, -0.15 and -0.13 respectively (near 0) so the effect to X_ Perimeter are smaller

The variable “X Perimeter” effect to Y Perimeter the most because the

coefficient of determination is 0.91 (near 1) While 4 variables “X_ Maximum”

0.02, -0.06 and -0.06 respectively (near 0) so the effect to Y_ Perimeter are smaller

Trang 23

Ô all language Im(formula = X_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +

CTntercept) 4.085e+02 6.946e+01 5.881 4.78e-09 ***

X_Maximum -9.413e-02 1.408e-02 -6.686 2.99e-11 ***

Y_Maximum 8.182e-07 3.847e-06 0.213 0.83159

steel_Plate_thickness -6.387e-Ol 1.265e-O1 -5.047 4.90e-07 ***

Length_of_conveyer -1.299e-01 4.919e-02 -2.641 0.00833 **

Signif codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 “.” 0.1“ ”1

Residual standard error: 293.2 on 1936 degrees of freedom

Multiple R-squared: 0.05472, Adjusted R-squared: 0.05276

F-statistic: 28.02 on 4 and 1936 DF, p-value: < 2.2e-16

xlevels list [0] List of length 0

O call language Im(formula = Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +

Trang 24

eal:

ImCformula = Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +

Residuals:

-159.8 -74.1 -34.5 10.2 18004.4

Coefficients:

CIntercept) 2.749e+02 1.006e+02 2.734 0.00632 **

X_Maximum -6.646e-02 2.038e-02 -3.261 0.00113 **

Y_Maximum 4.461e-06 5.569e-06 0.801 0.42325

Signif codes: 0O ‘**** 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.° 0.1 ‘°° 1

Residual standard error: 424.4 on 1936 degrees of freedom

Multiple R-squared: 0.01179, Adjusted R-squared: 0.009747

F-statistic: 5.774 on 4 and 1936 DF, p-value: 0.0001284

IV Prediction

- Input:

#Predictior|

#Predict the values for our test set

predictionl <- predict(modell, newdata = home)

#Examo1e

predict(model1, data frame (X_Maximum=100 , Y_Maximum=100000,Stee1_Plate_Thickness=200, Length_of_conveyer=1687))

o Case 1:X Maximum=100, Y Maximum=100000,

o Case 2: X_ Maximum=100, Y Maximum=100000,

#Prediction

#Predict the values for our test set

prediction1 <- predict(mode11, newdata = home)

It means with all the value of the variable in model | we can predict that the X_ perimeter will be 52.28809

Trang 25

In model2: The predicted Y Perimeter is 51.90771

It means with all the value of the variable in model | we can predict that the Y_perimeter will be 51.90771

Trang 26

Through a specific case, one can learn the complete flow of data analysis which includes data reading, binary variable transform, model construction, parameter selection, and cross-validation which are widely used in data analysis It is suggested that operating data analysis is a good way to accumulate experience for readers so that they can operate smoothly for more different types of data

Tiêu đề	Project report subject: Probability and statistics
Tác giả	Lư Thiờn Phỳ, Nguyờn Trõn Minh Quận, Phan Vũ Nhật Quang, Phan Tuan Ta, Bui Minh Nhat
Người hướng dẫn	Ph.D. Phan Thi Huong
Trường học	Viet Nam National University Ho Chi Minh University of Technology
Chuyên ngành	Probability and Statistics
Thể loại	Báo cáo dự án
Thành phố	Ho Chi Minh

Định dạng
Số trang	26
Dung lượng	5,22 MB