Fitting linear regression models: we want to explore what factors may affect the final grade.. Proposed method L Import data Load packages librarytibble #Data frame library MASS #Modern
Trang 1VIET NAM NATIONAL UNIVERSITY
Class: CC06 Group: 3 Subject code: MT2013
Trang 2
i10 0 28
Trang 3A Problem
This dataset comes from research by TR/Selcuk University Mechanical Engineering department The aim of the study is to determine how much of the adjustment parameters in 3d printers affect the print quality, accuracy and strenght Where there are nine setting parameters and three measured output parameters
b Descriptive statistics for each of the variables
c Graph: hist, boxplot, pairs
3 Fitting linear regression models: we want to explore what factors may affect the final grade
4 Predictions
B Proposed method
L Import data
Load packages
library(tibble) #Data frame
library (MASS) #Modern Applied Statistics with S
library(ggplot2) #Data Visualization
library(lattice) #Data Visualization
library (caret) #Cross Validation
library (e1071) #SVM
© Explain: We will use these packages for analysis
Read data
#Read Data
Trang 4o Explain: We used function read.csv to import data
o Result: we have data frame that include 50 rows and 13 columns
~~ walill_thickness infill_density nozzle_temperature bed_temperature print_speed fan_speed roughness tension_strenght
> str(home, list len=9)
"đata.frame': 50 obs of 14 variables:
$ tension strenght : int 18 16 810 5 24 1214 27 25
(list output truncated)
> summary (home[,1:9])
Kis wall_thickness infill_density nozzle_temperature bed_temperature print_speed fan_speed roughness
min 70.020 Min : 1.00 in :10.0 Min 7200.0 win :60 Min : 40 Min : 0 Min, $ 21:0
median :0,100 5.00 ‹0 Median :220.0 Median :70 Median: 60 Median: 50 Median :165.5 Mean :0.106 5.22 4 Mean :221.5 mean = :70 Men : 64 Mean : 50 Mean :170.6 3rd Qu :0.150 7.00 +O 3rd Qu :230.0 3rd Qu :75 3rd Qu.: 60 3rd Qu.: 75 3rd Qu.:239.2 Max :0 200 :10.00 ‹0 Max, :250.0 Max :80 Max ¡120 Max :Ở100 Max :368.0
Trang 5II Data Visualization
1 Transformation:
- Input:
erature,print_speed,fan_ speed,roughness,tension_ strenght))
Explain: We used function subset() to filter the data corresponding to the variables mentioned in part A
wail_thickness infill_density nozzie_temperature bed_temperature print_speed fan_speed roughness tension_strenght
Trang 6bed_temperature<- home$bed_temper ature
deviation of the column
Trang 7> bed_temperature<- homeSbed_temperature
> summary (bed_temper ature)
# Give the chart file a name
png(file = “bar wall_thickness png”)
Trang 8" TableQ/length() function are is to calculate the percentage of each type
Trang 10
3 Graph:
For each column represent for each numeric variable, we draw 4 type of graph barplot, boxplot, histogram, pair diagram and a linear regression model between the variable and price
a Bar plot
and create barplots with the barplot ( ) function
# Bar Plot
counts <- table(wall_thickness)
# Give the chart file a name
png(file = “bar wall_thickness png”)
Trang 13# Give the chart file a name
png(file = "boxplot wall_thickness png”)
Trang 14Result:
wall_thickness
Trang 17
a
¬ 8+
2
84
7 8¬ { '
Trang 18# Give the chart file a name
png(file = “hist wall_thickness.png")
Trang 21tension_stren,
4.0 3.0
25
20 1.5
10
Trang 22- Input:
#Find the correlation between the different variables
round(cor Chome) ,2)|
#Perform the linear regression analysis
#Create linear regression model
variables respect to X_Perimeter and Y_Perimeter
o And use function summary() to show all the parameter for this model
Steel_Plate_Thickness 0.11 -0.21 1.00 -0.15 -0.06 0.21
Y_Perimeter -0.09 0.02 -0.06 0.91 1.00 -0.06 Length_of_Cconveyer 0.30 -0.05 0.21 -0.13 -0.06 1.00
Then we saw that the variable “Y Perimeter” effect to X_ Perimeter the most because the coefficient of determination is 0.91 (near 1) While 4 variables
only get -0.19, 0.02, -0.15 and -0.13 respectively (near 0) so the effect to X_ Perimeter are smaller
The variable “X Perimeter” effect to Y Perimeter the most because the
coefficient of determination is 0.91 (near 1) While 4 variables “X_ Maximum”
0.02, -0.06 and -0.06 respectively (near 0) so the effect to Y_ Perimeter are smaller
Trang 23© coefficients double [5] 4,09e+02 -9.41e-02 8.18e-07 -6.39e-01 -1.30e-01
Ô all language Im(formula = X_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +
© terms formula X_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness + Length_of_Con
© model list [1941 x 5] (S3:data.frame) A data.frame with 1941 rows and 5 columns
CTntercept) 4.085e+02 6.946e+01 5.881 4.78e-09 ***
X_Maximum -9.413e-02 1.408e-02 -6.686 2.99e-11 ***
Y_Maximum 8.182e-07 3.847e-06 0.213 0.83159
steel_Plate_thickness -6.387e-Ol 1.265e-O1 -5.047 4.90e-07 ***
Length_of_conveyer -1.299e-01 4.919e-02 -2.641 0.00833 **
Signif codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 “.” 0.1“ ”1
Residual standard error: 293.2 on 1936 degrees of freedom
Multiple R-squared: 0.05472, Adjusted R-squared: 0.05276
F-statistic: 28.02 on 4 and 1936 DF, p-value: < 2.2e-16
xlevels list [0] List of length 0
O call language Im(formula = Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +
© terms formula Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate Thickness + Length_of_Con
© model list [1941 x 5] (S3: data.frame) | A data.frame with 1941 rows and 5 columns
Trang 24eal:
ImCformula = Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +
Residuals:
-159.8 -74.1 -34.5 10.2 18004.4
Coefficients:
CIntercept) 2.749e+02 1.006e+02 2.734 0.00632 **
X_Maximum -6.646e-02 2.038e-02 -3.261 0.00113 **
Y_Maximum 4.461e-06 5.569e-06 0.801 0.42325
Signif codes: 0O ‘**** 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.° 0.1 ‘°° 1
Residual standard error: 424.4 on 1936 degrees of freedom
Multiple R-squared: 0.01179, Adjusted R-squared: 0.009747
F-statistic: 5.774 on 4 and 1936 DF, p-value: 0.0001284
IV Prediction
- Input:
#Predictior|
#Predict the values for our test set
predictionl <- predict(modell, newdata = home)
#Examo1e
predict(model1, data frame (X_Maximum=100 , Y_Maximum=100000,Stee1_Plate_Thickness=200, Length_of_conveyer=1687))
o Case 1:X Maximum=100, Y Maximum=100000,
o Case 2: X_ Maximum=100, Y Maximum=100000,
#Prediction
#Predict the values for our test set
prediction1 <- predict(mode11, newdata = home)
It means with all the value of the variable in model | we can predict that the X_ perimeter will be 52.28809
Trang 25In model2: The predicted Y Perimeter is 51.90771
It means with all the value of the variable in model | we can predict that the Y_perimeter will be 51.90771
Trang 26Through a specific case, one can learn the complete flow of data analysis which includes data reading, binary variable transform, model construction, parameter selection, and cross-validation which are widely used in data analysis It is suggested that operating data analysis is a good way to accumulate experience for readers so that they can operate smoothly for more different types of data