Sample Output Dataplot generated the following autocorrelation output using theLEW.DAT data set: THE LAG-ONE AUTOCORRELATION COEFFICIENT OF THE 200 OBSERVATIONS = -0.3073048E+00 THE
Trang 2correlation coefficient plot and the probability plot are useful tools for determining a good distributional model for the data
Software The skewness and kurtosis coefficients are available in most general
purpose statistical software programs, including Dataplot 1.3.5.11 Measures of Skewness and Kurtosis
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm (4 of 4) [5/1/2006 9:57:21 AM]
Trang 3Sample Output Dataplot generated the following autocorrelation output using the
LEW.DAT data set:
THE LAG-ONE AUTOCORRELATION COEFFICIENT OF THE
200 OBSERVATIONS = -0.3073048E+00
THE COMPUTED VALUE OF THE CONSTANT A = -0.30730480E+00
lag autocorrelation
0 1.00
1 -0.31
2 -0.74
3 0.77
4 0.21
5 -0.90
6 0.38
7 0.63
8 -0.77
9 -0.12
10 0.82
11 -0.40
12 -0.55
13 0.73
14 0.07
15 -0.76
16 0.40
17 0.48
18 -0.70
19 -0.03
20 0.70
21 -0.41
22 -0.43
23 0.67
24 0.00
25 -0.66
26 0.42
27 0.39
28 -0.65
29 0.03
30 0.63
31 -0.42
32 -0.36
33 0.64
34 -0.05
35 -0.60
36 0.43
37 0.32
38 -0.64
39 0.08
40 0.58 1.3.5.12 Autocorrelation
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (2 of 4) [5/1/2006 9:57:45 AM]
Trang 441 -0.45
42 -0.28
43 0.62
44 -0.10
45 -0.55
46 0.45
47 0.25
48 -0.61
49 0.14
Questions The autocorrelation function can be used to answer the following
questions
Was this sample data set generated from a random process?
1
Would a non-linear or time series model be a more appropriate model for these data than a simple constant plus error model?
2
Importance Randomness is one of the key assumptions in determining if a
univariate statistical process is in control If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as:
where E i is an error term
If the randomness assumption is not valid, then a different model needs
to be used This will typically be either a time series model or a non-linear model (with time as the independent variable)
Related
Techniques
Autocorrelation Plot Run Sequence Plot Lag Plot
Runs Test
Case Study The heat flow meter data demonstrate the use of autocorrelation in
determining if the data are from a random process
The beam deflection data demonstrate the use of autocorrelation in developing a non-linear sinusoidal model
Software The autocorrelation capability is available in most general purpose
statistical software programs, including Dataplot 1.3.5.12 Autocorrelation
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (3 of 4) [5/1/2006 9:57:45 AM]
Trang 51.3.5.12 Autocorrelation
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm (4 of 4) [5/1/2006 9:57:45 AM]
Trang 6run of length r is r consecutive heads or r consecutive tails To use the Dataplot RUNS command, you could code a sequence of the N = 10 coin
tosses HHHHTTHTHH as
1 2 3 4 3 2 3 2 3 4 that is, a heads is coded as an increasing value and a tails is coded as a decreasing value
Another alternative is to code values above the median as positive and values below the median as negative There are other formulations as well All of them can be converted to the Dataplot formulation Just remember that it ultimately reduces to 2 choices To use the Dataplot runs test, simply code one choice as an increasing value and the other as a decreasing value as in the heads/tails example above If you are using other statistical software, you need to check the conventions used by that program
Sample Output Dataplot generated the following runs test output using the LEW.DAT data
set:
RUNS UP
STATISTIC = NUMBER OF RUNS UP
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
1 18.0 41.7083 6.4900 -3.65
2 40.0 18.2167 3.3444 6.51
3 2.0 5.2125 2.0355 -1.58
4 0.0 1.1302 1.0286 -1.10
5 0.0 0.1986 0.4424 -0.45
6 0.0 0.0294 0.1714 -0.17
7 0.0 0.0038 0.0615 -0.06
8 0.0 0.0004 0.0207 -0.02
9 0.0 0.0000 0.0066 -0.01
10 0.0 0.0000 0.0020 0.00
STATISTIC = NUMBER OF RUNS UP
OF LENGTH I OR MORE
I STAT EXP(STAT) SD(STAT) Z
1 60.0 66.5000 4.1972 -1.55
2 42.0 24.7917 2.8083 6.13
1.3.5.13 Runs Test for Detecting Non-randomness
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (2 of 5) [5/1/2006 9:57:45 AM]
Trang 73 2.0 6.5750 2.1639 -2.11
4 0.0 1.3625 1.1186 -1.22
5 0.0 0.2323 0.4777 -0.49
6 0.0 0.0337 0.1833 -0.18
7 0.0 0.0043 0.0652 -0.07
8 0.0 0.0005 0.0218 -0.02
9 0.0 0.0000 0.0069 -0.01
10 0.0 0.0000 0.0021 0.00
RUNS DOWN
STATISTIC = NUMBER OF RUNS DOWN
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
1 25.0 41.7083 6.4900 -2.57
2 35.0 18.2167 3.3444 5.02
3 0.0 5.2125 2.0355 -2.56
4 0.0 1.1302 1.0286 -1.10
5 0.0 0.1986 0.4424 -0.45
6 0.0 0.0294 0.1714 -0.17
7 0.0 0.0038 0.0615 -0.06
8 0.0 0.0004 0.0207 -0.02
9 0.0 0.0000 0.0066 -0.01
10 0.0 0.0000 0.0020 0.00
STATISTIC = NUMBER OF RUNS DOWN
OF LENGTH I OR MORE
I STAT EXP(STAT) SD(STAT) Z
1 60.0 66.5000 4.1972 -1.55
2 35.0 24.7917 2.8083 3.63
3 0.0 6.5750 2.1639 -3.04
4 0.0 1.3625 1.1186 -1.22
5 0.0 0.2323 0.4777 -0.49
6 0.0 0.0337 0.1833 -0.18
7 0.0 0.0043 0.0652 -0.07
8 0.0 0.0005 0.0218 -0.02
9 0.0 0.0000 0.0069 -0.01
10 0.0 0.0000 0.0021 0.00
1.3.5.13 Runs Test for Detecting Non-randomness
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (3 of 5) [5/1/2006 9:57:45 AM]
Trang 8RUNS TOTAL = RUNS UP + RUNS DOWN
STATISTIC = NUMBER OF RUNS TOTAL
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
1 43.0 83.4167 9.1783 -4.40
2 75.0 36.4333 4.7298 8.15
3 2.0 10.4250 2.8786 -2.93
4 0.0 2.2603 1.4547 -1.55
5 0.0 0.3973 0.6257 -0.63
6 0.0 0.0589 0.2424 -0.24
7 0.0 0.0076 0.0869 -0.09
8 0.0 0.0009 0.0293 -0.03
9 0.0 0.0001 0.0093 -0.01
10 0.0 0.0000 0.0028 0.00
STATISTIC = NUMBER OF RUNS TOTAL
OF LENGTH I OR MORE
I STAT EXP(STAT) SD(STAT) Z
1 120.0 133.0000 5.9358 -2.19
2 77.0 49.5833 3.9716 6.90
3 2.0 13.1500 3.0602 -3.64
4 0.0 2.7250 1.5820 -1.72
5 0.0 0.4647 0.6756 -0.69
6 0.0 0.0674 0.2592 -0.26
7 0.0 0.0085 0.0923 -0.09
8 0.0 0.0010 0.0309 -0.03
9 0.0 0.0001 0.0098 -0.01
10 0.0 0.0000 0.0030 0.00
LENGTH OF THE LONGEST RUN UP = 3 LENGTH OF THE LONGEST RUN DOWN = 2 LENGTH OF THE LONGEST RUN UP OR DOWN = 3
NUMBER OF POSITIVE DIFFERENCES = 104 NUMBER OF NEGATIVE DIFFERENCES = 95 NUMBER OF ZERO DIFFERENCES = 0
1.3.5.13 Runs Test for Detecting Non-randomness
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (4 of 5) [5/1/2006 9:57:45 AM]
Trang 9Interpretation of
Sample Output
Scanning the last column labeled "Z", we note that most of the z-scores for run lengths 1, 2, and 3 have an absolute value greater than 1.96 This is strong evidence that these data are in fact not random
Output from other statistical software may look somewhat different from the above output
Question The runs test can be used to answer the following question:
Were these sample data generated from a random process?
●
Importance Randomness is one of the key assumptions in determining if a univariate
statistical process is in control If the assumptions of constant location and scale, randomness, and fixed distribution are reasonable, then the univariate process can be modeled as:
where E i is an error term
If the randomness assumption is not valid, then a different model needs to be used This will typically be either a times series model or a non-linear model (with time as the independent variable)
Related
Techniques
Autocorrelation Run Sequence Plot Lag Plot
Case Study Heat flow meter data
Software Most general purpose statistical software programs, including Dataplot,
support a runs test
1.3.5.13 Runs Test for Detecting Non-randomness
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35d.htm (5 of 5) [5/1/2006 9:57:45 AM]
Trang 10Significance Level:
Critical Region:
The critical values for the Anderson-Darling test are dependent
on the specific distribution that is being tested Tabulated values and formulas have been published (Stephens, 1974, 1976, 1977,
1979) for a few specific distributions (normal, lognormal, exponential, Weibull, logistic, extreme value type 1) The test is
a one-sided test and the hypothesis that the distribution is of a specific form is rejected if the test statistic, A, is greater than the critical value
Note that for a given distribution, the Anderson-Darling statistic may be multiplied by a constant (which usually depends on the
sample size, n) These constants are given in the various papers
by Stephens In the sample output below, this is the "adjusted Anderson-Darling" statistic This is what should be compared against the critical values Also, be aware that different constants (and therefore critical values) have been published You just need to be aware of what constant was used for a given set of critical values (the needed constant is typically given with the critical values)
Sample
Output
Dataplot generated the following output for the Anderson-Darling test 1,000 random numbers were generated for a normal, double exponential, Cauchy, and lognormal distribution In all four cases, the Anderson-Darling test was applied to test for a normal distribution When the data were generated using a normal distribution, the test statistic was small and the hypothesis was
accepted When the data were generated using the double exponential, Cauchy, and lognormal distributions, the statistics were significant, and the hypothesis
of an underlying normal distribution was rejected at significance levels of 0.10, 0.05, and 0.01
The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the Cauchy random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4
***************************************
** anderson darling normal test y1 **
***************************************
ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION
1 STATISTICS:
NUMBER OF OBSERVATIONS = 1000 MEAN = 0.4359940E-02 1.3.5.14 Anderson-Darling Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm (2 of 5) [5/1/2006 9:57:46 AM]
Trang 11STANDARD DEVIATION = 1.001816
ANDERSON-DARLING TEST STATISTIC VALUE = 0.2565918 ADJUSTED TEST STATISTIC VALUE = 0.2576117
2 CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000 97.5 % POINT = 0.9180000
99 % POINT = 1.092000
3 CONCLUSION (AT THE 5% LEVEL):
THE DATA DO COME FROM A NORMAL DISTRIBUTION.
***************************************
** anderson darling normal test y2 **
***************************************
ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION
1 STATISTICS:
NUMBER OF OBSERVATIONS = 1000 MEAN = 0.2034888E-01 STANDARD DEVIATION = 1.321627
ANDERSON-DARLING TEST STATISTIC VALUE = 5.826050 ADJUSTED TEST STATISTIC VALUE = 5.849208
2 CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000 97.5 % POINT = 0.9180000
99 % POINT = 1.092000
3 CONCLUSION (AT THE 5% LEVEL):
THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION.
***************************************
** anderson darling normal test y3 **
***************************************
ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION
1 STATISTICS:
NUMBER OF OBSERVATIONS = 1000 MEAN = 1.503854 STANDARD DEVIATION = 35.13059
ANDERSON-DARLING TEST STATISTIC VALUE = 287.6429 ADJUSTED TEST STATISTIC VALUE = 288.7863 1.3.5.14 Anderson-Darling Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm (3 of 5) [5/1/2006 9:57:46 AM]
Trang 12
2 CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000 97.5 % POINT = 0.9180000
99 % POINT = 1.092000
3 CONCLUSION (AT THE 5% LEVEL):
THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION.
***************************************
** anderson darling normal test y4 **
***************************************
ANDERSON-DARLING 1-SAMPLE TEST THAT THE DATA CAME FROM A NORMAL DISTRIBUTION
1 STATISTICS:
NUMBER OF OBSERVATIONS = 1000 MEAN = 1.518372 STANDARD DEVIATION = 1.719969
ANDERSON-DARLING TEST STATISTIC VALUE = 83.06335 ADJUSTED TEST STATISTIC VALUE = 83.39352
2 CRITICAL VALUES:
90 % POINT = 0.6560000
95 % POINT = 0.7870000 97.5 % POINT = 0.9180000
99 % POINT = 1.092000
3 CONCLUSION (AT THE 5% LEVEL):
THE DATA DO NOT COME FROM A NORMAL DISTRIBUTION.
Interpretation
of the Sample
Output
The output is divided into three sections
The first section prints the number of observations and estimates for the location and scale parameters
1
The second section prints the upper critical value for the Anderson-Darling test statistic distribution corresponding to various significance levels The value in the first column, the confidence level of the test, is equivalent to 100(1- ) We reject the null hypothesis at that significance level if the value of the Anderson-Darling test statistic printed in section one is greater than the critical value printed in the last column
2
The third section prints the conclusion for a 95% test For a different significance level, the appropriate conclusion can be drawn from the table printed in section two For example, for = 0.10, we look at the row for 90% confidence and compare the critical value 1.062 to the
3
1.3.5.14 Anderson-Darling Test
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35e.htm (4 of 5) [5/1/2006 9:57:46 AM]