1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Engineering Statistics Handbook Episode 2 Part 15 ppt

20 207 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 20
Dung lượng 110,6 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Since the lag plot indicates significant non-randomness, we omit the distributional plots.. For this data set, Dataplot generates the following output: LEAST SQUARES MULTILINEAR FIT SAM

Trang 1

data in the first and last thirds was collected in winter while the more stable middle third was collected in the summer The seasonal effect was determined to be caused by the amount of humidity affecting the measurement equipment In this case, the solution was to modify the test equipment to be less sensitive to enviromental factors.

Simple graphical techniques can be quite effective in revealing unexpected results in the data When this occurs, it is important to investigate whether the unexpected result is due to problems in the experiment and data collection, or is it in fact indicative of an unexpected underlying structure in the data This determination cannot

be made on the basis of statistics alone The role of the graphical and statistical analysis is to detect problems or unexpected results in the data Resolving the issues requires the knowledge of the scientist or engineer.

Individual

Plots

Although it is generally unnecessary, the plots can be generated individually to give more detail Since the lag plot indicates significant non-randomness, we omit the distributional plots.

Run

Sequence

Plot

1.4.2.7.2 Graphical Output and Interpretation

Trang 2

Lag Plot

1.4.2.7.2 Graphical Output and Interpretation

Trang 3

* = * TUK -.5 PPCC = 0.7334843E+00

*

* = * CAUCHY PPCC = 0.3347875E+00

*

***********************************************************************

The autocorrelation coefficient of 0.972 is evidence of significant non-randomness.

Location One way to quantify a change in location over time is to fit a straight line to the data set

using the index variable X = 1, 2, , N, with N denoting the number of observations If there is no significant drift in the location, the slope parameter estimate should be zero For this data set, Dataplot generates the following output:

LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 1000 NUMBER OF VARIABLES = 1

NO REPLICATION CASE

PARAMETER ESTIMATES (APPROX ST DEV.) T VALUE

1 A0 27.9114 (0.1209E-02) 0.2309E+05

2 A1 X 0.209670E-03 (0.2092E-05) 100.2

RESIDUAL STANDARD DEVIATION = 0.1909796E-01 RESIDUAL DEGREES OF FREEDOM = 998

COEF AND SD(COEF) WRITTEN OUT TO FILE DPST1F.DAT SD(PRED),95LOWER,95UPPER,99LOWER,99UPPER

WRITTEN OUT TO FILE DPST2F.DAT REGRESSION DIAGNOSTICS WRITTEN OUT TO FILE DPST3F.DAT PARAMETER VARIANCE-COVARIANCE MATRIX AND

INVERSE OF X-TRANSPOSE X MATRIX WRITTEN OUT TO FILE DPST4F.DAT

The slope parameter, A1, has a t value of 100 which is statistically significant The value

of the slope parameter estimate is 0.00021 Although this number is nearly zero, we need

to take into account that the original scale of the data is from about 27.8 to 28.2 In this case, we conclude that there is a drift in location.

1.4.2.7.3 Quantitative Output and Interpretation

Trang 4

Variation One simple way to detect a change in variation is with a Bartlett test after dividing the

data set into several equal-sized intervals However, the Bartlett test is not robust for non-normality Since the normality assumption is questionable for these data, we use the alternative Levene test In partiuclar, we use the Levene test based on the median rather the mean The choice of the number of intervals is somewhat arbitrary, although values of

4 or 8 are reasonable Dataplot generated the following output for the Levene test.

LEVENE F-TEST FOR SHIFT IN VARIATION (ASSUMPTION: NORMALITY)

1 STATISTICS NUMBER OF OBSERVATIONS = 1000 NUMBER OF GROUPS = 4 LEVENE F TEST STATISTIC = 140.8509

FOR LEVENE TEST STATISTIC

0 % POINT = 0.0000000E+00

50 % POINT = 0.7891988

75 % POINT = 1.371589

90 % POINT = 2.089303

95 % POINT = 2.613852

99 % POINT = 3.801369 99.9 % POINT = 5.463994

100.0000 % Point: 140.8509

3 CONCLUSION (AT THE 5% LEVEL):

THERE IS A SHIFT IN VARIATION

THUS: NOT HOMOGENEOUS WITH RESPECT TO VARIATION

In this case, since the Levene test statistic value of 140.9 is greater than the 5%

significance level critical value of 2.6, we conclude that there is significant evidence of nonconstant variation.

Randomness

There are many ways in which data can be non-random However, most common forms

of non-randomness can be detected with a few simple tests The lag plot in the 4-plot in the previous section is a simple graphical technique.

One check is an autocorrelation plot that shows the autocorrelations for various lags Confidence bands can be plotted at the 95% and 99% confidence levels Points outside this band indicate statistically significant values (lag 0 is always 1) Dataplot generated the following autocorrelation plot.

1.4.2.7.3 Quantitative Output and Interpretation

Trang 5

The lag 1 autocorrelation, which is generally the one of greatest interest, is 0.97 The critical values at the 5% significance level are -0.062 and 0.062 This indicates that the lag 1 autocorrelation is statistically significant, so there is strong evidence of

non-randomness.

A common test for randomness is the runs test.

RUNS UP STATISTIC = NUMBER OF RUNS UP

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

1 178.0 208.3750 14.5453 -2.09

2 90.0 91.5500 7.5002 -0.21

3 29.0 26.3236 4.5727 0.59

4 16.0 5.7333 2.3164 4.43

5 2.0 1.0121 0.9987 0.99

6 0.0 0.1507 0.3877 -0.39

7 0.0 0.0194 0.1394 -0.14

8 0.0 0.0022 0.0470 -0.05

9 0.0 0.0002 0.0150 -0.02

10 0.0 0.0000 0.0046 0.00

STATISTIC = NUMBER OF RUNS UP

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

1 315.0 333.1667 9.4195 -1.93

2 137.0 124.7917 6.2892 1.94

3 47.0 33.2417 4.8619 2.83

4 18.0 6.9181 2.5200 4.40

5 2.0 1.1847 1.0787 0.76

1.4.2.7.3 Quantitative Output and Interpretation

Trang 6

6 0.0 0.1726 0.4148 -0.42

7 0.0 0.0219 0.1479 -0.15

8 0.0 0.0025 0.0496 -0.05

9 0.0 0.0002 0.0158 -0.02

10 0.0 0.0000 0.0048 0.00

RUNS DOWN STATISTIC = NUMBER OF RUNS DOWN

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

1 195.0 208.3750 14.5453 -0.92

2 81.0 91.5500 7.5002 -1.41

3 32.0 26.3236 4.5727 1.24

4 4.0 5.7333 2.3164 -0.75

5 1.0 1.0121 0.9987 -0.01

6 1.0 0.1507 0.3877 2.19

7 0.0 0.0194 0.1394 -0.14

8 0.0 0.0022 0.0470 -0.05

9 0.0 0.0002 0.0150 -0.02

10 0.0 0.0000 0.0046 0.00

STATISTIC = NUMBER OF RUNS DOWN

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

1 314.0 333.1667 9.4195 -2.03

2 119.0 124.7917 6.2892 -0.92

3 38.0 33.2417 4.8619 0.98

4 6.0 6.9181 2.5200 -0.36

5 2.0 1.1847 1.0787 0.76

6 1.0 0.1726 0.4148 1.99

7 0.0 0.0219 0.1479 -0.15

8 0.0 0.0025 0.0496 -0.05

9 0.0 0.0002 0.0158 -0.02

10 0.0 0.0000 0.0048 0.00

RUNS TOTAL = RUNS UP + RUNS DOWN STATISTIC = NUMBER OF RUNS TOTAL

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

1 373.0 416.7500 20.5701 -2.13

2 171.0 183.1000 10.6068 -1.14

3 61.0 52.6472 6.4668 1.29

4 20.0 11.4667 3.2759 2.60

5 3.0 2.0243 1.4123 0.69

6 1.0 0.3014 0.5483 1.27

7 0.0 0.0389 0.1971 -0.20

8 0.0 0.0044 0.0665 -0.07

9 0.0 0.0005 0.0212 -0.02

10 0.0 0.0000 0.0065 -0.01

1.4.2.7.3 Quantitative Output and Interpretation

Trang 7

STATISTIC = NUMBER OF RUNS TOTAL

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

1 629.0 666.3333 13.3212 -2.80

2 256.0 249.5833 8.8942 0.72

3 85.0 66.4833 6.8758 2.69

4 24.0 13.8361 3.5639 2.85

5 4.0 2.3694 1.5256 1.07

6 1.0 0.3452 0.5866 1.12

7 0.0 0.0438 0.2092 -0.21

8 0.0 0.0049 0.0701 -0.07

9 0.0 0.0005 0.0223 -0.02

10 0.0 0.0000 0.0067 -0.01

LENGTH OF THE LONGEST RUN UP = 5 LENGTH OF THE LONGEST RUN DOWN = 6 LENGTH OF THE LONGEST RUN UP OR DOWN = 6 NUMBER OF POSITIVE DIFFERENCES = 505

NUMBER OF NEGATIVE DIFFERENCES = 469 NUMBER OF ZERO DIFFERENCES = 25

Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level Due to the number of values that are larger than the 1.96 cut-off, we conclude that the data are not random However, in this case the evidence from the runs test is not nearly as strong as it is from the autocorrelation plot.

Distributional

Analysis

Since we rejected the randomness assumption, the distributional tests are not meaningful Therefore, these quantitative tests are omitted Since the Grubbs' test for outliers also assumes the approximate normality of the data, we omit Grubbs' test as well.

Univariate

Report

It is sometimes useful and convenient to summarize the above results in a report.

Analysis for resistor case study

1: Sample Size = 1000

2: Location Mean = 28.01635 Standard Deviation of Mean = 0.002008 95% Confidence Interval for Mean = (28.0124,28.02029) Drift with respect to location? = NO

3: Variation Standard Deviation = 0.063495 95% Confidence Interval for SD = (0.060829,0.066407) Change in variation?

(based on Levene's test on quarters

of the data) = YES

4: Randomness Autocorrelation = 0.972158

1.4.2.7.3 Quantitative Output and Interpretation

Trang 8

Data Are Random?

(as measured by autocorrelation) = NO

5: Distribution Distributional test omitted due to non-randomness of the data

6: Statistical Control (i.e., no drift in location or scale, data are random, distribution is fixed)

Data Set is in Statistical Control? = NO

7: Outliers?

(Grubbs' test omitted due to non-randomness of the data

1.4.2.7.3 Quantitative Output and Interpretation

Trang 9

1 Invoke Dataplot and read data.

1 Read in the data

1 You have read 1 column of numbers into Dataplot, variable Y

2 4-plot of the data

in location and variation and the data are not random

3 Generate the individual plots

1 Generate a run sequence plot

2 Generate a lag plot

1 The run sequence plot indicates that there are shifts of location and variation

2 The lag plot shows a strong linear pattern, which indicates significant non-randomness

4 Generate summary statistics, quantitative

analysis, and print a univariate report

1 Generate a table of summary

statistics

2 Generate the sample mean, a confidence

interval for the population mean, and

compute a linear fit to detect drift in

1 The summary statistics table displays 25+ statistics

2 The mean is 28.0163 and a 95%

confidence interval is (28.0124,28.02029) The linear fit indicates drift in

1.4.2.7.4 Work This Example Yourself

Trang 10

location.

3 Generate the sample standard deviation,

a confidence interval for the population

standard deviation, and detect drift in

variation by dividing the data into

quarters and computing Levene's test for

equal standard deviations

4 Check for randomness by generating an

autocorrelation plot and a runs test

5 Print a univariate report (this assumes

steps 2 thru 5 have already been run)

location since the slope parameter estimate is statistically significant

3 The standard deviation is 0.0635 with

a 95% confidence interval of (0.060829,0.066407) Levene's test indicates significant

change in variation

4 The lag 1 autocorrelation is 0.97

From the autocorrelation plot, this is outside the 95% confidence interval bands, indicating significant non-randomness

5 The results are summarized in a convenient report

1.4.2.7.4 Work This Example Yourself

Trang 11

1 Exploratory Data Analysis

1.4 EDA Case Studies

1.4.2 Case Studies

1.4.2.8 Heat Flow Meter 1

1.4.2.8.1 Background and Data

Generation This data set was collected by Bob Zarr of NIST in January, 1990 from

a heat flow meter calibration and stability analysis The response variable is a calibration factor.

The motivation for studying this data set is to illustrate a well-behaved process where the underlying assumptions hold and the process is in statistical control.

This file can be read by Dataplot with the following commands:

SKIP 25 READ ZARR13.DAT Y

Resulting

Data

The following are the data used for this case study.

9.206343 9.299992 9.277895 9.305795 9.275351 9.288729 9.287239 9.260973 9.303111 9.275674 9.272561 9.288454 9.255672 9.252141 9.297670 9.266534 9.256689 9.277542 9.248205

1.4.2.8.1 Background and Data

Trang 12

9.252107 9.276345 9.278694 9.267144 9.246132 9.238479 9.269058 9.248239 9.257439 9.268481 9.288454 9.258452 9.286130 9.251479 9.257405 9.268343 9.291302 9.219460 9.270386 9.218808 9.241185 9.269989 9.226585 9.258556 9.286184 9.320067 9.327973 9.262963 9.248181 9.238644 9.225073 9.220878 9.271318 9.252072 9.281186 9.270624 9.294771 9.301821 9.278849 9.236680 9.233988 9.244687 9.221601 9.207325 9.258776 9.275708

1.4.2.8.1 Background and Data

Trang 13

9.268955 9.257269 9.264979 9.295500 9.292883 9.264188 9.280731 9.267336 9.300566 9.253089 9.261376 9.238409 9.225073 9.235526 9.239510 9.264487 9.244242 9.277542 9.310506 9.261594 9.259791 9.253089 9.245735 9.284058 9.251122 9.275385 9.254619 9.279526 9.275065 9.261952 9.275351 9.252433 9.230263 9.255150 9.268780 9.290389 9.274161 9.255707 9.261663 9.250455 9.261952 9.264041 9.264509 9.242114 9.239674 9.221553

1.4.2.8.1 Background and Data

Ngày đăng: 06/08/2014, 11:20