1. Trang chủ
  2. » Luận Văn - Báo Cáo

Project report subject probability and statistics

26 1 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Project report subject: Probability and statistics
Tác giả Lư Thiờn Phỳ, Nguyờn Trõn Minh Quận, Phan Vũ Nhật Quang, Phan Tuan Ta, Bui Minh Nhat
Người hướng dẫn Ph.D. Phan Thi Huong
Trường học Viet Nam National University Ho Chi Minh University of Technology
Chuyên ngành Probability and Statistics
Thể loại Báo cáo dự án
Thành phố Ho Chi Minh
Định dạng
Số trang 26
Dung lượng 5,22 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Fitting linear regression models: we want to explore what factors may affect the final grade.. Proposed method L Import data Load packages librarytibble #Data frame library MASS #Modern

Trang 1

VIET NAM NATIONAL UNIVERSITY

Class: CC06 Group: 3 Subject code: MT2013

Trang 2

i10 0 28

Trang 3

A Problem

This dataset comes from research by TR/Selcuk University Mechanical Engineering department The aim of the study is to determine how much of the adjustment parameters in 3d printers affect the print quality, accuracy and strenght Where there are nine setting parameters and three measured output parameters

b Descriptive statistics for each of the variables

c Graph: hist, boxplot, pairs

3 Fitting linear regression models: we want to explore what factors may affect the final grade

4 Predictions

B Proposed method

L Import data

Load packages

library(tibble) #Data frame

library (MASS) #Modern Applied Statistics with S

library(ggplot2) #Data Visualization

library(lattice) #Data Visualization

library (caret) #Cross Validation

library (e1071) #SVM

© Explain: We will use these packages for analysis

Read data

#Read Data

Trang 4

o Explain: We used function read.csv to import data

o Result: we have data frame that include 50 rows and 13 columns

~~ walill_thickness infill_density nozzle_temperature bed_temperature print_speed fan_speed roughness tension_strenght

> str(home, list len=9)

"đata.frame': 50 obs of 14 variables:

$ tension strenght : int 18 16 810 5 24 1214 27 25

(list output truncated)

> summary (home[,1:9])

Kis wall_thickness infill_density nozzle_temperature bed_temperature print_speed fan_speed roughness

min 70.020 Min : 1.00 in :10.0 Min 7200.0 win :60 Min : 40 Min : 0 Min, $ 21:0

median :0,100 5.00 ‹0 Median :220.0 Median :70 Median: 60 Median: 50 Median :165.5 Mean :0.106 5.22 4 Mean :221.5 mean = :70 Men : 64 Mean : 50 Mean :170.6 3rd Qu :0.150 7.00 +O 3rd Qu :230.0 3rd Qu :75 3rd Qu.: 60 3rd Qu.: 75 3rd Qu.:239.2 Max :0 200 :10.00 ‹0 Max, :250.0 Max :80 Max ¡120 Max :Ở100 Max :368.0

Trang 5

II Data Visualization

1 Transformation:

- Input:

erature,print_speed,fan_ speed,roughness,tension_ strenght))

Explain: We used function subset() to filter the data corresponding to the variables mentioned in part A

wail_thickness infill_density nozzie_temperature bed_temperature print_speed fan_speed roughness tension_strenght

Trang 6

bed_temperature<- home$bed_temper ature

deviation of the column

Trang 7

> bed_temperature<- homeSbed_temperature

> summary (bed_temper ature)

# Give the chart file a name

png(file = “bar wall_thickness png”)

Trang 8

" TableQ/length() function are is to calculate the percentage of each type

Trang 10

3 Graph:

For each column represent for each numeric variable, we draw 4 type of graph barplot, boxplot, histogram, pair diagram and a linear regression model between the variable and price

a Bar plot

and create barplots with the barplot ( ) function

# Bar Plot

counts <- table(wall_thickness)

# Give the chart file a name

png(file = “bar wall_thickness png”)

Trang 13

# Give the chart file a name

png(file = "boxplot wall_thickness png”)

Trang 14

Result:

wall_thickness

Trang 17

a

¬ 8+

2

84

7 8¬ { '

Trang 18

# Give the chart file a name

png(file = “hist wall_thickness.png")

Trang 21

tension_stren,

4.0 3.0

25

20 1.5

10

Trang 22

- Input:

#Find the correlation between the different variables

round(cor Chome) ,2)|

#Perform the linear regression analysis

#Create linear regression model

variables respect to X_Perimeter and Y_Perimeter

o And use function summary() to show all the parameter for this model

Steel_Plate_Thickness 0.11 -0.21 1.00 -0.15 -0.06 0.21

Y_Perimeter -0.09 0.02 -0.06 0.91 1.00 -0.06 Length_of_Cconveyer 0.30 -0.05 0.21 -0.13 -0.06 1.00

Then we saw that the variable “Y Perimeter” effect to X_ Perimeter the most because the coefficient of determination is 0.91 (near 1) While 4 variables

only get -0.19, 0.02, -0.15 and -0.13 respectively (near 0) so the effect to X_ Perimeter are smaller

The variable “X Perimeter” effect to Y Perimeter the most because the

coefficient of determination is 0.91 (near 1) While 4 variables “X_ Maximum”

0.02, -0.06 and -0.06 respectively (near 0) so the effect to Y_ Perimeter are smaller

Trang 23

© coefficients double [5] 4,09e+02 -9.41e-02 8.18e-07 -6.39e-01 -1.30e-01

Ô all language Im(formula = X_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +

© terms formula X_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness + Length_of_Con

© model list [1941 x 5] (S3:data.frame) A data.frame with 1941 rows and 5 columns

CTntercept) 4.085e+02 6.946e+01 5.881 4.78e-09 ***

X_Maximum -9.413e-02 1.408e-02 -6.686 2.99e-11 ***

Y_Maximum 8.182e-07 3.847e-06 0.213 0.83159

steel_Plate_thickness -6.387e-Ol 1.265e-O1 -5.047 4.90e-07 ***

Length_of_conveyer -1.299e-01 4.919e-02 -2.641 0.00833 **

Signif codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 “.” 0.1“ ”1

Residual standard error: 293.2 on 1936 degrees of freedom

Multiple R-squared: 0.05472, Adjusted R-squared: 0.05276

F-statistic: 28.02 on 4 and 1936 DF, p-value: < 2.2e-16

xlevels list [0] List of length 0

O call language Im(formula = Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +

© terms formula Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate Thickness + Length_of_Con

© model list [1941 x 5] (S3: data.frame) | A data.frame with 1941 rows and 5 columns

Trang 24

eal:

ImCformula = Y_Perimeter ~ X_Maximum + Y_Maximum + Steel_Plate_Thickness +

Residuals:

-159.8 -74.1 -34.5 10.2 18004.4

Coefficients:

CIntercept) 2.749e+02 1.006e+02 2.734 0.00632 **

X_Maximum -6.646e-02 2.038e-02 -3.261 0.00113 **

Y_Maximum 4.461e-06 5.569e-06 0.801 0.42325

Signif codes: 0O ‘**** 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.° 0.1 ‘°° 1

Residual standard error: 424.4 on 1936 degrees of freedom

Multiple R-squared: 0.01179, Adjusted R-squared: 0.009747

F-statistic: 5.774 on 4 and 1936 DF, p-value: 0.0001284

IV Prediction

- Input:

#Predictior|

#Predict the values for our test set

predictionl <- predict(modell, newdata = home)

#Examo1e

predict(model1, data frame (X_Maximum=100 , Y_Maximum=100000,Stee1_Plate_Thickness=200, Length_of_conveyer=1687))

o Case 1:X Maximum=100, Y Maximum=100000,

o Case 2: X_ Maximum=100, Y Maximum=100000,

#Prediction

#Predict the values for our test set

prediction1 <- predict(mode11, newdata = home)

It means with all the value of the variable in model | we can predict that the X_ perimeter will be 52.28809

Trang 25

In model2: The predicted Y Perimeter is 51.90771

It means with all the value of the variable in model | we can predict that the Y_perimeter will be 51.90771

Trang 26

Through a specific case, one can learn the complete flow of data analysis which includes data reading, binary variable transform, model construction, parameter selection, and cross-validation which are widely used in data analysis It is suggested that operating data analysis is a good way to accumulate experience for readers so that they can operate smoothly for more different types of data

Ngày đăng: 10/02/2025, 15:59

w