1. Trang chủ
  2. » Kỹ Thuật - Công Nghệ

Engineering Statistics Handbook Episode 1 Part 13 doc

20 289 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 20
Dung lượng 72,05 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Sample Output Dataplot generated the following autocorrelation output using theLEW.DAT data set: THE LAG-ONE AUTOCORRELATION COEFFICIENT OF THE 200 OBSERVATIONS = -0.3073048E+00 THE

Trang 2

correlation coefficient plot and the probability plot are useful tools for determining a good distributional model for the data

Software The skewness and kurtosis coefficients are available in most general

purpose statistical software programs, including Dataplot 1.3.5.11 Measures of Skewness and Kurtosis

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm (4 of 4) [5/1/2006 9:57:21 AM]

Trang 3

Sample Output Dataplot generated the following autocorrelation output using the

LEW.DAT data set:

THE LAG-ONE AUTOCORRELATION COEFFICIENT OF THE

200 OBSERVATIONS = -0.3073048E+00

THE COMPUTED VALUE OF THE CONSTANT A = -0.30730480E+00

lag autocorrelation

0 1.00

1 -0.31

2 -0.74

3 0.77

4 0.21

5 -0.90

6 0.38

7 0.63

8 -0.77

9 -0.12

10 0.82

11 -0.40

12 -0.55

13 0.73

14 0.07

15 -0.76

16 0.40

17 0.48

18 -0.70

19 -0.03

20 0.70

21 -0.41

22 -0.43

23 0.67

24 0.00

25 -0.66

26 0.42

27 0.39

28 -0.65

29 0.03

30 0.63

31 -0.42

32 -0.36

33 0.64

34 -0.05

35 -0.60

36 0.43

37 0.32

38 -0.64

39 0.08

40 0.58 1.3.5.12 Autocorrelation

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (2 of 4) [5/1/2006 9:57:45 AM]

Trang 4

41 -0.45

42 -0.28

43 0.62

44 -0.10

45 -0.55

46 0.45

47 0.25

48 -0.61

49 0.14

Questions The autocorrelation function can be used to answer the following

questions

Was this sample data set generated from a random process?

1

Would a non-linear or time series model be a more appropriate model for these data than a simple constant plus error model?

2

Importance Randomness is one of the key assumptions in determining if a

univariate statistical process is in control If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as:

where E i is an error term

If the randomness assumption is not valid, then a different model needs

to be used This will typically be either a time series model or a non-linear model (with time as the independent variable)

Related

Techniques

Autocorrelation Plot Run Sequence Plot Lag Plot

Runs Test

Case Study The heat flow meter data demonstrate the use of autocorrelation in

determining if the data are from a random process

The beam deflection data demonstrate the use of autocorrelation in developing a non-linear sinusoidal model

Software The autocorrelation capability is available in most general purpose

statistical software programs, including Dataplot 1.3.5.12 Autocorrelation

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (3 of 4) [5/1/2006 9:57:45 AM]

Trang 5

1.3.5.12 Autocorrelation

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (4 of 4) [5/1/2006 9:57:45 AM]

Trang 6

run of length r is r consecutive heads or r consecutive tails To use the Dataplot RUNS command, you could code a sequence of the N = 10 coin

tosses HHHHTTHTHH as

1 2 3 4 3 2 3 2 3 4 that is, a heads is coded as an increasing value and a tails is coded as a decreasing value

Another alternative is to code values above the median as positive and values below the median as negative There are other formulations as well All of them can be converted to the Dataplot formulation Just remember that it ultimately reduces to 2 choices To use the Dataplot runs test, simply code one choice as an increasing value and the other as a decreasing value as in the heads/tails example above If you are using other statistical software, you need to check the conventions used by that program

Sample Output Dataplot generated the following runs test output using the LEW.DAT data

set:

RUNS UP

STATISTIC = NUMBER OF RUNS UP

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

1 18.0 41.7083 6.4900 -3.65

2 40.0 18.2167 3.3444 6.51

3 2.0 5.2125 2.0355 -1.58

4 0.0 1.1302 1.0286 -1.10

5 0.0 0.1986 0.4424 -0.45

6 0.0 0.0294 0.1714 -0.17

7 0.0 0.0038 0.0615 -0.06

8 0.0 0.0004 0.0207 -0.02

9 0.0 0.0000 0.0066 -0.01

10 0.0 0.0000 0.0020 0.00

STATISTIC = NUMBER OF RUNS UP

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

1 60.0 66.5000 4.1972 -1.55

2 42.0 24.7917 2.8083 6.13

1.3.5.13 Runs Test for Detecting Non-randomness

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (2 of 5) [5/1/2006 9:57:45 AM]

Trang 7

3 2.0 6.5750 2.1639 -2.11

4 0.0 1.3625 1.1186 -1.22

5 0.0 0.2323 0.4777 -0.49

6 0.0 0.0337 0.1833 -0.18

7 0.0 0.0043 0.0652 -0.07

8 0.0 0.0005 0.0218 -0.02

9 0.0 0.0000 0.0069 -0.01

10 0.0 0.0000 0.0021 0.00

RUNS DOWN

STATISTIC = NUMBER OF RUNS DOWN

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

1 25.0 41.7083 6.4900 -2.57

2 35.0 18.2167 3.3444 5.02

3 0.0 5.2125 2.0355 -2.56

4 0.0 1.1302 1.0286 -1.10

5 0.0 0.1986 0.4424 -0.45

6 0.0 0.0294 0.1714 -0.17

7 0.0 0.0038 0.0615 -0.06

8 0.0 0.0004 0.0207 -0.02

9 0.0 0.0000 0.0066 -0.01

10 0.0 0.0000 0.0020 0.00

STATISTIC = NUMBER OF RUNS DOWN

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

1 60.0 66.5000 4.1972 -1.55

2 35.0 24.7917 2.8083 3.63

3 0.0 6.5750 2.1639 -3.04

4 0.0 1.3625 1.1186 -1.22

5 0.0 0.2323 0.4777 -0.49

6 0.0 0.0337 0.1833 -0.18

7 0.0 0.0043 0.0652 -0.07

8 0.0 0.0005 0.0218 -0.02

9 0.0 0.0000 0.0069 -0.01

10 0.0 0.0000 0.0021 0.00

1.3.5.13 Runs Test for Detecting Non-randomness

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (3 of 5) [5/1/2006 9:57:45 AM]

Trang 8

RUNS TOTAL = RUNS UP + RUNS DOWN

STATISTIC = NUMBER OF RUNS TOTAL

OF LENGTH EXACTLY I

I STAT EXP(STAT) SD(STAT) Z

1 43.0 83.4167 9.1783 -4.40

2 75.0 36.4333 4.7298 8.15

3 2.0 10.4250 2.8786 -2.93

4 0.0 2.2603 1.4547 -1.55

5 0.0 0.3973 0.6257 -0.63

6 0.0 0.0589 0.2424 -0.24

7 0.0 0.0076 0.0869 -0.09

8 0.0 0.0009 0.0293 -0.03

9 0.0 0.0001 0.0093 -0.01

10 0.0 0.0000 0.0028 0.00

STATISTIC = NUMBER OF RUNS TOTAL

OF LENGTH I OR MORE

I STAT EXP(STAT) SD(STAT) Z

1 120.0 133.0000 5.9358 -2.19

2 77.0 49.5833 3.9716 6.90

3 2.0 13.1500 3.0602 -3.64

4 0.0 2.7250 1.5820 -1.72

5 0.0 0.4647 0.6756 -0.69

6 0.0 0.0674 0.2592 -0.26

7 0.0 0.0085 0.0923 -0.09

8 0.0 0.0010 0.0309 -0.03

9 0.0 0.0001 0.0098 -0.01

10 0.0 0.0000 0.0030 0.00

LENGTH OF THE LONGEST RUN UP = 3 LENGTH OF THE LONGEST RUN DOWN = 2 LENGTH OF THE LONGEST RUN UP OR DOWN = 3

NUMBER OF POSITIVE DIFFERENCES = 104 NUMBER OF NEGATIVE DIFFERENCES = 95 NUMBER OF ZERO DIFFERENCES = 0

1.3.5.13 Runs Test for Detecting Non-randomness

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (4 of 5) [5/1/2006 9:57:45 AM]

Trang 9

Interpretation of

Sample Output

Scanning the last column labeled "Z", we note that most of the z-scores for run lengths 1, 2, and 3 have an absolute value greater than 1.96 This is strong evidence that these data are in fact not random

Output from other statistical software may look somewhat different from the above output

Question The runs test can be used to answer the following question:

Were these sample data generated from a random process?

Importance Randomness is one of the key assumptions in determining if a univariate

statistical process is in control If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as:

where E i is an error term

If the randomness assumption is not valid, then a different model needs to be used This will typically be either a times series model or a non-linear model (with time as the independent variable)

Related

Techniques

Autocorrelation Run Sequence Plot Lag Plot

Case Study Heat flow meter data

Software Most general purpose statistical software programs, including Dataplot,

support a runs test

1.3.5.13 Runs Test for Detecting Non-randomness

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (5 of 5) [5/1/2006 9:57:45 AM]

Trang 10

Significance Level:

Critical Region:

The critical values for the Anderson-Darling test are dependent

on the specific distribution that is being tested Tabulated values and formulas have been published (Stephens, 1974, 1976, 1977,

1979) for a few specific distributions (normal, lognormal, exponential, Weibull, logistic, extreme value type 1) The test is

a one-sided test and the hypothesis that the distribution is of a specific form is rejected if the test statistic, A, is greater than the critical value

Note that for a given distribution, the Anderson-Darling statistic may be multiplied by a constant (which usually depends on the

sample size, n) These constants are given in the various papers

by Stephens In the sample output below, this is the "adjusted Anderson-Darling" statistic This is what should be compared against the critical values Also, be aware that different constants (and therefore critical values) have been published You just need to be aware of what constant was used for a given set of critical values (the needed constant is typically given with the critical values)

Sample

Output

Dataplot generated the following output for the Anderson-Darling test 1,000 random numbers were generated for a normal, double exponential, Cauchy, and lognormal distribution In all four cases, the Anderson-Darling test was applied to test for a normal distribution When the data were generated using a normal distribution, the test statistic was small and the hypothesis was

accepted When the data were generated using the double exponential, Cauchy, and lognormal distributions, the statistics were significant, and the hypothesis

of an underlying normal distribution was rejected at significance levels of 0.10, 0.05, and 0.01

The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the Cauchy random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4

***************************************

** anderson darling normal test y1 **

***************************************

ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION

1 STATISTICS:

NUMBER OF OBSERVATIONS = 1000 MEAN = 0.4359940E-02 1.3.5.14 Anderson-Darling Test

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm (2 of 5) [5/1/2006 9:57:46 AM]

Trang 11

STANDARD DEVIATION = 1.001816

ANDERSON-DARLING TEST STATISTIC VALUE = 0.2565918 ADJUSTED TEST STATISTIC VALUE = 0.2576117

2 CRITICAL VALUES:

90 % POINT = 0.6560000

95 % POINT = 0.7870000 97.5 % POINT = 0.9180000

99 % POINT = 1.092000

3 CONCLUSION (AT THE 5% LEVEL):

THE DATA DO COME FROM A NORMAL DISTRIBUTION.

***************************************

** anderson darling normal test y2 **

***************************************

ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION

1 STATISTICS:

NUMBER OF OBSERVATIONS = 1000 MEAN = 0.2034888E-01 STANDARD DEVIATION = 1.321627

ANDERSON-DARLING TEST STATISTIC VALUE = 5.826050 ADJUSTED TEST STATISTIC VALUE = 5.849208

2 CRITICAL VALUES:

90 % POINT = 0.6560000

95 % POINT = 0.7870000 97.5 % POINT = 0.9180000

99 % POINT = 1.092000

3 CONCLUSION (AT THE 5% LEVEL):

THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION.

***************************************

** anderson darling normal test y3 **

***************************************

ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION

1 STATISTICS:

NUMBER OF OBSERVATIONS = 1000 MEAN = 1.503854 STANDARD DEVIATION = 35.13059

ANDERSON-DARLING TEST STATISTIC VALUE = 287.6429 ADJUSTED TEST STATISTIC VALUE = 288.7863 1.3.5.14 Anderson-Darling Test

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm (3 of 5) [5/1/2006 9:57:46 AM]

Trang 12

2 CRITICAL VALUES:

90 % POINT = 0.6560000

95 % POINT = 0.7870000 97.5 % POINT = 0.9180000

99 % POINT = 1.092000

3 CONCLUSION (AT THE 5% LEVEL):

THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION.

***************************************

** anderson darling normal test y4 **

***************************************

ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION

1 STATISTICS:

NUMBER OF OBSERVATIONS = 1000 MEAN = 1.518372 STANDARD DEVIATION = 1.719969

ANDERSON-DARLING TEST STATISTIC VALUE = 83.06335 ADJUSTED TEST STATISTIC VALUE = 83.39352

2 CRITICAL VALUES:

90 % POINT = 0.6560000

95 % POINT = 0.7870000 97.5 % POINT = 0.9180000

99 % POINT = 1.092000

3 CONCLUSION (AT THE 5% LEVEL):

THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION.

Interpretation

of the Sample

Output

The output is divided into three sections

The first section prints the number of observations and estimates for the location and scale parameters

1

The second section prints the upper critical value for the Anderson-Darling test statistic distribution corresponding to various significance levels The value in the first column, the confidence level of the test, is equivalent to 100(1- ) We reject the null hypothesis at that significance level if the value of the Anderson-Darling test statistic printed in section one is greater than the critical value printed in the last column

2

The third section prints the conclusion for a 95% test For a different significance level, the appropriate conclusion can be drawn from the table printed in section two For example, for = 0.10, we look at the row for 90% confidence and compare the critical value 1.062 to the

3

1.3.5.14 Anderson-Darling Test

http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm (4 of 5) [5/1/2006 9:57:46 AM]

Ngày đăng: 06/08/2014, 11:20

TỪ KHÓA LIÊN QUAN