1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Engineering Statistics Handbook Episode 2 Part 12 docx

21 305 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 21
Dung lượng 101,88 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

The runs test indicates some mild non-randomness.. Although the runs test and lag 1 autocorrelation indicate some mild non-randomness, it is not sufficient to reject the Yi = C + Ei mode

Trang 2

STATISTIC = NUMBER OF RUNS UP

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

RUNS DOWN

STATISTIC = NUMBER OF RUNS DOWN

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

STATISTIC = NUMBER OF RUNS DOWN

OF LENGTH I OR MORE

Trang 3

9 0.0 0.0002 0.0132 -0.01

10 0.0 0.0000 0.0040 0.00

RUNS TOTAL = RUNS UP + RUNS DOWN

STATISTIC = NUMBER OF RUNS TOTAL

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

STATISTIC = NUMBER OF RUNS TOTAL

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

LENGTH OF THE LONGEST RUN UP = 7 LENGTH OF THE LONGEST RUN DOWN = 6 LENGTH OF THE LONGEST RUN UP OR DOWN = 7

NUMBER OF POSITIVE DIFFERENCES = 262 NUMBER OF NEGATIVE DIFFERENCES = 258 NUMBER OF ZERO DIFFERENCES = 179

Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level The runs test indicates some mild non-randomness.

Although the runs test and lag 1 autocorrelation indicate some mild non-randomness, it is

not sufficient to reject the Yi = C + Ei model At least part of the non-randomness can be explained by the discrete nature of the data.

1.4.2.4.3 Quantitative Output and Interpretation

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm (5 of 8) [5/1/2006 9:58:49 AM]

Trang 4

Chi-square and Kolmogorov-Smirnov goodness-of-fit tests are alternative methods for assessing distributional adequacy The Wilk-Shapiro and Anderson-Darling tests can be used to test for normality Dataplot generates the following output for the

Anderson-Darling normality test.

ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION

1 STATISTICS:

NUMBER OF OBSERVATIONS = 700 MEAN = 2898.562 STANDARD DEVIATION = 1.304969

ANDERSON-DARLING TEST STATISTIC VALUE = 16.76349 ADJUSTED TEST STATISTIC VALUE = 16.85843

2 CRITICAL VALUES:

90 % POINT = 0.6560000

95 % POINT = 0.7870000 97.5 % POINT = 0.9180000

99 % POINT = 1.092000

3 CONCLUSION (AT THE 5% LEVEL):

THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION

The Anderson-Darling test rejects the normality assumption because the test statistic, 16.76, is greater than the 99% critical value 1.092.

Although the data are not strictly normal, the violation of the normality assumption is not

severe enough to conclude that the Yi = C + Ei model is unreasonable At least part of the non-normality can be explained by the discrete nature of the data.

1 STATISTICS:

NUMBER OF OBSERVATIONS = 700 MINIMUM = 2895.000 MEAN = 2898.562 MAXIMUM = 2902.000 STANDARD DEVIATION = 1.304969

GRUBBS TEST STATISTIC = 2.729201

1.4.2.4.3 Quantitative Output and Interpretation

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm (6 of 8) [5/1/2006 9:58:49 AM]

Trang 5

2 PERCENT POINTS OF THE REFERENCE DISTRIBUTION FOR GRUBBS TEST STATISTIC

99 % POINT = 4.311552

100 % POINT = 26.41972

3 CONCLUSION (AT THE 5% LEVEL):

THERE ARE NO OUTLIERS

For this data set, Grubbs' test does not detect any outliers at the 10%, 5%, and 1%

significance levels.

Model Although the randomness and normality assumptions were mildly violated, we conclude

that a reasonable model for the data is:

In addition, a 95% confidence interval for the mean value is (2898.515,2898.928).

Univariate

Report It is sometimes useful and convenient to summarize the above results in a report.

Analysis for Josephson Junction Cryothermometry Data

1: Sample Size = 700

2: Location Mean = 2898.562 Standard Deviation of Mean = 0.049323 95% Confidence Interval for Mean = (2898.465,2898.658) Drift with respect to location? = YES

(Further analysis indicates that the drift, while statistically significant, is not practically significant)

3: Variation Standard Deviation = 1.30497 95% Confidence Interval for SD = (1.240007,1.377169) Drift with respect to variation?

(based on Levene's test on quarters

of the data) = NO

4: Distribution Normal PPCC = 0.97484 Data are Normal?

(as measured by Normal PPCC) = NO

5: Randomness Autocorrelation = 0.314802 Data are Random?

(as measured by autocorrelation) = NO

6: Statistical Control (i.e., no drift in location or scale, data are random, distribution is

1.4.2.4.3 Quantitative Output and Interpretation

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm (7 of 8) [5/1/2006 9:58:49 AM]

Trang 6

fixed, here we are testing only for fixed normal)

Data Set is in Statistical Control? = NO

Note: Although we have violations of the assumptions, they are mild enough, and at least partially explained by the discrete nature of the data, so we may model the data as if it were in statistical

control

7: Outliers?

(as determined by Grubbs test) = NO

1.4.2.4.3 Quantitative Output and Interpretation

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4243.htm (8 of 8) [5/1/2006 9:58:49 AM]

Trang 7

4 The discrete nature of the data masks the normality or non-normality of the data somewhat The plot indicates that

a normal distribution provides a rough approximation for the data.

4 Generate summary statistics, quantitative

analysis, and print a univariate report.

1 Generate a table of summary

statistics.

2 Generate the mean, a confidence

interval for the mean, and compute

a linear fit to detect drift in

location.

3 Generate the standard deviation, a

confidence interval for the standard

deviation, and detect drift in variation

by dividing the data into quarters and

computing Levene's test for equal

standard deviations.

4 Check for randomness by generating an

autocorrelation plot and a runs test.

5 Check for normality by computing the

normal probability plot correlation

coefficient.

6 Check for outliers using Grubbs' test.

7 Print a univariate report (this assumes

steps 2 thru 6 have already been run).

1 The summary statistics table displays 25+ statistics.

2 The mean is 2898.56 and a 95%

confidence interval is (2898.46,2898.66) The linear fit indicates no meaningful drift

in location since the value of the slope parameter is near zero.

3 The standard devaition is 1.30 with

a 95% confidence interval of (1.24,1.38) Levene's test indicates no significant drift in variation.

4 The lag 1 autocorrelation is 0.31.

This indicates some mild non-randomness.

5 The normal probability plot correlation coefficient is 0.975 At the 5% level,

we reject the normality assumption.

6 Grubbs' test detects no outliers at the 5% level.

7 The results are summarized in a convenient report.

1.4.2.4.4 Work This Example Yourself

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4244.htm (2 of 2) [5/1/2006 9:58:50 AM]

Trang 8

1 Exploratory Data Analysis

1.4 EDA Case Studies

1.4.2 Case Studies

1.4.2.5 Beam Deflections

1.4.2.5.1 Background and Data

Generation This data set was collected by H S Lew of NIST in 1969 to measure

steel-concrete beam deflections The response variable is the deflection

of a beam from the center point.

The motivation for studying this data set is to show how the underlying assumptions are affected by periodic data.

This file can be read by Dataplot with the following commands:

SKIP 25 READ LEW.DAT Y

Resulting

Data

The following are the data used for this case study.

-213 -564 -35 -15 141 115 -420 -360 203 -338 -431 194 -220 -513 154 -125 -559 92 -21 -5791.4.2.5.1 Background and Data

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm (1 of 6) [5/1/2006 9:58:50 AM]

Trang 9

-52 99 -543 -175 162 -457 -346 204 -300 -474 164 -107 -572 -8 83 -541 -224 180 -420 -374 201 -236 -531 83 27 -564 -112 131 -507 -254 199 -311 -495 143 -46 -579 -90 136 -472 -338 202 -287 -477 169 -124 -5681.4.2.5.1 Background and Data

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm (2 of 6) [5/1/2006 9:58:50 AM]

Trang 10

17 48 -568 -135 162 -430 -422 172 -74 -577 -13 92 -534 -243 194 -355 -465 156 -81 -578 -64 139 -449 -384 193 -198 -538 110 -44 -577 -6 66 -552 -164 161 -460 -344 205 -281 -504 134 -28 -576 -118 156 -4371.4.2.5.1 Background and Data

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm (3 of 6) [5/1/2006 9:58:50 AM]

Trang 11

-381 200 -220 -540 83 11 -568 -160 172 -414 -408 188 -125 -572 -32 139 -492 -321 205 -262 -504 142 -83 -574 0 48 -571 -106 137 -501 -266 190 -391 -406 194 -186 -553 83 -13 -577 -49 103 -515 -280 201 3001.4.2.5.1 Background and Data

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm (4 of 6) [5/1/2006 9:58:50 AM]

Trang 12

-506 131 -45 -578 -80 138 -462 -361 201 -211 -554 32 74 -533 -235 187 -372 -442 182 -147 -566 25 68 -535 -244 194 -351 -463 174 -125 -570 15 72 -550 -190 172 -424 -385 198 -218 -536 961.4.2.5.1 Background and Data

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm (5 of 6) [5/1/2006 9:58:50 AM]

Trang 13

1.4.2.5.1 Background and Data

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4251.htm (6 of 6) [5/1/2006 9:58:50 AM]

Trang 14

Interpretation The assumptions are addressed by the graphics shown above:

The run sequence plot (upper left) indicates that the data do not have any significant shifts in location or scale over time.

3

From the above plots we conclude that the underlying randomness assumption is not valid Therefore, the model

is not appropriate.

We need to develop a better model Non-random data can frequently be modeled using

time series mehtodology Specifically, the circular pattern in the lag plot indicates that a sinusoidal model might be appropriate The sinusoidal model will be developed in the next section.

Trang 15

Lag Plot

We have drawn some lines and boxes on the plot to better isolate the outliers The following output helps identify the points that are generating the outliers on the lag plot.

****************************************************

** print y index xplot yplot subset yplot > 250 **

****************************************************

VARIABLES Y INDEX XPLOT YPLOT 300.00 158.00 -506.00 300.00

****************************************************

** print y index xplot yplot subset xplot > 250 **

****************************************************

VARIABLES Y INDEX XPLOT YPLOT 201.00 157.00 300.00 201.00

********************************************************* ** print y index xplot yplot subset yplot 100 to 200 subset xplot 100 to 200 ** *********************************************************

1.4.2.5.2 Test Underlying Assumptions

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm (3 of 9) [5/1/2006 9:58:51 AM]

Trang 16

VARIABLES Y INDEX XPLOT YPLOT 141.00 5.00 115.00 141.00

That is, the third, fifth, and 158th points appear to be outliers.

Spectral Plot Another useful plot for non-random data is the spectral plot.

1.4.2.5.2 Test Underlying Assumptions

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm (4 of 9) [5/1/2006 9:58:51 AM]

Trang 17

This spectral plot shows a single dominant peak at a frequency of 0.3 This frequency of 0.3 will be used in fitting the sinusoidal model in the next section.

Quantitative

Output

Although the lag plot, autocorrelation plot, and spectral plot clearly show the violation of the randomness assumption, we supplement the graphical output with some quantitative measures.

1.4.2.5.2 Test Underlying Assumptions

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm (5 of 9) [5/1/2006 9:58:51 AM]

Trang 18

Location One way to quantify a change in location over time is to fit a straight line to the data set

using the index variable X = 1, 2, , N, with N denoting the number of observations If there is no significant drift in the location, the slope parameter should be zero For this data set, Dataplot generates the following output:

LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 200 NUMBER OF VARIABLES = 1

NO REPLICATION CASE

PARAMETER ESTIMATES (APPROX ST DEV.) TVALUE

1 A0 -178.175 ( 39.47 ) -4.514

2 A1 X 0.736593E-02 (0.3405 ) 0.2163E-01

RESIDUAL STANDARD DEVIATION = 278.0313 RESIDUAL DEGREES OF FREEDOM = 198

The slope parameter, A1, has a t value of 0.022 which is statistically not significant This indicates that the slope can in fact be considered zero.

Variation One simple way to detect a change in variation is with a Bartlett test after dividing the

data set into several equal-sized intervals However, the Bartlett the non-randomness of this data does not allows us to assume normality, we use the alternative Levene test In partiuclar, we use the Levene test based on the median rather the mean The choice of the number of intervals is somewhat arbitrary, although values of 4 or 8 are reasonable Dataplot generated the following output for the Levene test.

LEVENE F-TEST FOR SHIFT IN VARIATION (ASSUMPTION: NORMALITY)

1 STATISTICS NUMBER OF OBSERVATIONS = 200 NUMBER OF GROUPS = 4 LEVENE F TEST STATISTIC = 0.9378599E-01

FOR LEVENE TEST STATISTIC

3.659895 % Point: 0.9378599E-01

3 CONCLUSION (AT THE 5% LEVEL):

THERE IS NO SHIFT IN VARIATION

THUS: HOMOGENEOUS WITH RESPECT TO VARIATION

In this case, the Levene test indicates that the standard deviations are significantly

1.4.2.5.2 Test Underlying Assumptions

http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm (6 of 9) [5/1/2006 9:58:51 AM]

Trang 19

different in the 4 intervals since the test statistic of 13.2 is greater than the 95% critical value of 2.6 Therefore we conclude that the scale is not constant.

Randomness A runs test is used to check for randomness

RUNS UP STATISTIC = NUMBER OF RUNS UP

Ngày đăng: 06/08/2014, 11:20

TỪ KHÓA LIÊN QUAN