4-Plot ofData Interpretation The assumptions are addressed by the graphics shown above: The run sequence plot upper left indicates significant shifts in both location and variation.. Sin
Trang 128.0486 28.0427 28.0548 28.0616 28.0298 28.0726 28.0695 28.0629 28.0503 28.0493 28.0537 28.0613 28.0643 28.0678 28.0564 28.0703 28.0647 28.0579 28.0630 28.0716 28.0586 28.0607 28.0601 28.0611 28.0606 28.0611 28.0066 28.0412 28.0558 28.0590 28.0750 28.0483 28.0599 28.0490 28.0499 28.0565 28.0612 28.0634 28.0627 28.0519 28.0551 28.0696 28.0581 28.0568 28.0572 28.0529
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (15 of 23) [5/1/2006 9:58:55 AM]
Trang 228.0421 28.0432 28.0211 28.0363 28.0436 28.0619 28.0573 28.0499 28.0340 28.0474 28.0534 28.0589 28.0466 28.0448 28.0576 28.0558 28.0522 28.0480 28.0444 28.0429 28.0624 28.0610 28.0461 28.0564 28.0734 28.0565 28.0503 28.0581 28.0519 28.0625 28.0583 28.0645 28.0642 28.0535 28.0510 28.0542 28.0677 28.0416 28.0676 28.0596 28.0635 28.0558 28.0623 28.0718 28.0585 28.0552
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (16 of 23) [5/1/2006 9:58:55 AM]
Trang 328.0684 28.0646 28.0590 28.0465 28.0594 28.0303 28.0533 28.0561 28.0585 28.0497 28.0582 28.0507 28.0562 28.0715 28.0468 28.0411 28.0587 28.0456 28.0705 28.0534 28.0558 28.0536 28.0552 28.0461 28.0598 28.0598 28.0650 28.0423 28.0442 28.0449 28.0660 28.0506 28.0655 28.0512 28.0407 28.0475 28.0411 28.0512 28.1036 28.0641 28.0572 28.0700 28.0577 28.0637 28.0534 28.0461
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (17 of 23) [5/1/2006 9:58:55 AM]
Trang 428.0701 28.0631 28.0575 28.0444 28.0592 28.0684 28.0593 28.0677 28.0512 28.0644 28.0660 28.0542 28.0768 28.0515 28.0579 28.0538 28.0526 28.0833 28.0637 28.0529 28.0535 28.0561 28.0736 28.0635 28.0600 28.0520 28.0695 28.0608 28.0608 28.0590 28.0290 28.0939 28.0618 28.0551 28.0757 28.0698 28.0717 28.0529 28.0644 28.0613 28.0759 28.0745 28.0736 28.0611 28.0732 28.0782
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (18 of 23) [5/1/2006 9:58:55 AM]
Trang 528.0682 28.0756 28.0857 28.0739 28.0840 28.0862 28.0724 28.0727 28.0752 28.0732 28.0703 28.0849 28.0795 28.0902 28.0874 28.0971 28.0638 28.0877 28.0751 28.0904 28.0971 28.0661 28.0711 28.0754 28.0516 28.0961 28.0689 28.1110 28.1062 28.0726 28.1141 28.0913 28.0982 28.0703 28.0654 28.0760 28.0727 28.0850 28.0877 28.0967 28.1185 28.0945 28.0834 28.0764 28.1129 28.0797
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (19 of 23) [5/1/2006 9:58:55 AM]
Trang 628.0707 28.1008 28.0971 28.0826 28.0857 28.0984 28.0869 28.0795 28.0875 28.1184 28.0746 28.0816 28.0879 28.0888 28.0924 28.0979 28.0702 28.0847 28.0917 28.0834 28.0823 28.0917 28.0779 28.0852 28.0863 28.0942 28.0801 28.0817 28.0922 28.0914 28.0868 28.0832 28.0881 28.0910 28.0886 28.0961 28.0857 28.0859 28.1086 28.0838 28.0921 28.0945 28.0839 28.0877 28.0803 28.0928
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (20 of 23) [5/1/2006 9:58:55 AM]
Trang 728.0885 28.0940 28.0856 28.0849 28.0955 28.0955 28.0846 28.0871 28.0872 28.0917 28.0931 28.0865 28.0900 28.0915 28.0963 28.0917 28.0950 28.0898 28.0902 28.0867 28.0843 28.0939 28.0902 28.0911 28.0909 28.0949 28.0867 28.0932 28.0891 28.0932 28.0887 28.0925 28.0928 28.0883 28.0946 28.0977 28.0914 28.0959 28.0926 28.0923 28.0950 28.1006 28.0924 28.0963 28.0893 28.0956
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (21 of 23) [5/1/2006 9:58:55 AM]
Trang 828.0980 28.0928 28.0951 28.0958 28.0912 28.0990 28.0915 28.0957 28.0976 28.0888 28.0928 28.0910 28.0902 28.0950 28.0995 28.0965 28.0972 28.0963 28.0946 28.0942 28.0998 28.0911 28.1043 28.1002 28.0991 28.0959 28.0996 28.0926 28.1002 28.0961 28.0983 28.0997 28.0959 28.0988 28.1029 28.0989 28.1000 28.0944 28.0979 28.1005 28.1012 28.1013 28.0999 28.0991 28.1059 28.0961
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (22 of 23) [5/1/2006 9:58:55 AM]
Trang 928.0981 28.1045 28.1047 28.1042 28.1146 28.1113 28.1051 28.1065 28.1065 28.0985 28.1000 28.1066 28.1041 28.0954 28.1090
1.4.2.7.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4271.htm (23 of 23) [5/1/2006 9:58:55 AM]
Trang 101 Exploratory Data Analysis
1.4 EDA Case Studies
1.4.2 Case Studies
1.4.2.7 Standard Resistor
1.4.2.7.2 Graphical Output and
Interpretation
Goal The goal of this analysis is threefold:
Determine if the univariate model:
is appropriate and valid.
Determine if the confidence interval
is appropriate and valid where s is the standard deviation of the
original data.
3
1.4.2.7.2 Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4272.htm (1 of 4) [5/1/2006 9:58:56 AM]
Trang 114-Plot of
Data
Interpretation The assumptions are addressed by the graphics shown above:
The run sequence plot (upper left) indicates significant shifts in both location and variation Specifically, the location is
increasing with time The variability seems greater in the first and last third of the data than it does in the middle third.
The distributional plots, the histogram (lower left) and the
normal probability plot (lower right), are not interpreted since the randomness assumption is so clearly violated.
However, discussions with the scientist revealed the following:
the drift with respect to location was expected.
Trang 12data in the first and last thirds was collected in winter while the more stable middle third was collected in the summer The seasonal effect was determined to be caused by the amount of humidity affecting the measurement equipment In this case, the solution was to modify the test equipment to be less sensitive to enviromental factors.
Simple graphical techniques can be quite effective in revealing unexpected results in the data When this occurs, it is important to investigate whether the unexpected result is due to problems in the experiment and data collection, or is it in fact indicative of an unexpected underlying structure in the data This determination cannot
be made on the basis of statistics alone The role of the graphical and statistical analysis is to detect problems or unexpected results in the data Resolving the issues requires the knowledge of the scientist or engineer.
Individual
Plots
Although it is generally unnecessary, the plots can be generated individually to give more detail Since the lag plot indicates significant non-randomness, we omit the distributional plots.
Trang 13Lag Plot
1.4.2.7.2 Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4272.htm (4 of 4) [5/1/2006 9:58:56 AM]
Trang 141 Exploratory Data Analysis
1.4 EDA Case Studies
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (1 of 7) [5/1/2006 9:58:57 AM]
Trang 15The autocorrelation coefficient of 0.972 is evidence of significant non-randomness.
Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, , N, with N denoting the number of observations If there is no significant drift in the location, the slope parameter estimate should be zero For this data set, Dataplot generates the following output:
LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 1000 NUMBER OF VARIABLES = 1
NO REPLICATION CASE
PARAMETER ESTIMATES (APPROX ST DEV.) TVALUE
1 A0 27.9114 (0.1209E-02) 0.2309E+05
2 A1 X 0.209670E-03 (0.2092E-05) 100.2
RESIDUAL STANDARD DEVIATION = 0.1909796E-01 RESIDUAL DEGREES OF FREEDOM = 998
COEF AND SD(COEF) WRITTEN OUT TO FILE DPST1F.DAT SD(PRED),95LOWER,95UPPER,99LOWER,99UPPER
WRITTEN OUT TO FILE DPST2F.DAT REGRESSION DIAGNOSTICS WRITTEN OUT TO FILE DPST3F.DAT PARAMETER VARIANCE-COVARIANCE MATRIX AND
INVERSE OF X-TRANSPOSE X MATRIX WRITTEN OUT TO FILE DPST4F.DATThe slope parameter, A1, has a t value of 100 which is statistically significant The value
of the slope parameter estimate is 0.00021 Although this number is nearly zero, we need
to take into account that the original scale of the data is from about 27.8 to 28.2 In this case, we conclude that there is a drift in location.
1.4.2.7.3 Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (2 of 7) [5/1/2006 9:58:57 AM]
Trang 16Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal-sized intervals However, the Bartlett test is not robust for non-normality Since the normality assumption is questionable for these data, we use the alternative Levene test In partiuclar, we use the Levene test based on the median rather the mean The choice of the number of intervals is somewhat arbitrary, although values of
4 or 8 are reasonable Dataplot generated the following output for the Levene test.
LEVENE F-TEST FOR SHIFT IN VARIATION (ASSUMPTION: NORMALITY)
1 STATISTICS NUMBER OF OBSERVATIONS = 1000 NUMBER OF GROUPS = 4 LEVENE F TEST STATISTIC = 140.8509
FOR LEVENE TEST STATISTIC
100.0000 % Point: 140.8509
3 CONCLUSION (AT THE 5% LEVEL):
THERE IS A SHIFT IN VARIATION
THUS: NOT HOMOGENEOUS WITH RESPECT TO VARIATION
In this case, since the Levene test statistic value of 140.9 is greater than the 5%
significance level critical value of 2.6, we conclude that there is significant evidence of nonconstant variation.
Randomness
There are many ways in which data can be non-random However, most common forms
of non-randomness can be detected with a few simple tests The lag plot in the 4-plot in the previous section is a simple graphical technique.
One check is an autocorrelation plot that shows the autocorrelations for various lags Confidence bands can be plotted at the 95% and 99% confidence levels Points outside this band indicate statistically significant values (lag 0 is always 1) Dataplot generated the following autocorrelation plot.
1.4.2.7.3 Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (3 of 7) [5/1/2006 9:58:57 AM]
Trang 17The lag 1 autocorrelation, which is generally the one of greatest interest, is 0.97 The critical values at the 5% significance level are -0.062 and 0.062 This indicates that the lag 1 autocorrelation is statistically significant, so there is strong evidence of
non-randomness.
A common test for randomness is the runs test.
RUNS UP
STATISTIC = NUMBER OF RUNS UP
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (4 of 7) [5/1/2006 9:58:57 AM]
Trang 18RUNS TOTAL = RUNS UP + RUNS DOWN
STATISTIC = NUMBER OF RUNS TOTAL
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (5 of 7) [5/1/2006 9:58:57 AM]
Trang 19STATISTIC = NUMBER OF RUNS TOTAL
NUMBER OF POSITIVE DIFFERENCES = 505 NUMBER OF NEGATIVE DIFFERENCES = 469 NUMBER OF ZERO DIFFERENCES = 25
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level Due to the number of values that are larger than the 1.96 cut-off, we conclude that the data are not random However, in this case the evidence from the runs test is not nearly as strong as it is from the autocorrelation plot.
Distributional
Analysis
Since we rejected the randomness assumption, the distributional tests are not meaningful Therefore, these quantitative tests are omitted Since the Grubbs' test for outliers also assumes the approximate normality of the data, we omit Grubbs' test as well.
Univariate
Report
It is sometimes useful and convenient to summarize the above results in a report.
Analysis for resistor case study
1: Sample Size = 1000
2: Location Mean = 28.01635 Standard Deviation of Mean = 0.002008 95% Confidence Interval for Mean = (28.0124,28.02029) Drift with respect to location? = NO
3: Variation Standard Deviation = 0.063495 95% Confidence Interval for SD = (0.060829,0.066407) Change in variation?
(based on Levene's test on quarters
of the data) = YES
4: Randomness Autocorrelation = 0.9721581.4.2.7.3 Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (6 of 7) [5/1/2006 9:58:57 AM]
Trang 20Data Are Random?
(as measured by autocorrelation) = NO
5: Distribution Distributional test omitted due to non-randomness of the data
6: Statistical Control (i.e., no drift in location or scale, data are random, distribution is fixed)
Data Set is in Statistical Control? = NO
7: Outliers?
(Grubbs' test omitted due to non-randomness of the data
1.4.2.7.3 Quantitative Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4273.htm (7 of 7) [5/1/2006 9:58:57 AM]
Trang 211 Exploratory Data Analysis
1.4 EDA Case Studies
Click on the links below to start Dataplot and run this case study
yourself Each step may use results from previous steps, so please be
patient Wait until the software verifies that the current step is
complete before clicking on the next step.
NOTE: This case study has 1,000 points For better performance, it
is highly recommended that you check the "No Update" box on the
Spreadsheet window for this case study This will suppress
subsequent updating of the Spreadsheet window as the data are
Trang 221 Invoke Dataplot and read data.
1 Read in the data
1 You have read 1 column of numbers into Dataplot, variable Y
2 4-plot of the data
in location and variation and the data are not random
3 Generate the individual plots
1 Generate a run sequence plot
2 Generate a lag plot
1 The run sequence plot indicates that there are shifts of location and variation
2 The lag plot shows a strong linear pattern, which indicates significant non-randomness
4 Generate summary statistics, quantitative
analysis, and print a univariate report
1 Generate a table of summary
statistics
2 Generate the sample mean, a confidence
interval for the population mean, and
compute a linear fit to detect drift in
1 The summary statistics table displays 25+ statistics
2 The mean is 28.0163 and a 95%
confidence interval is (28.0124,28.02029) The linear fit indicates drift in
1.4.2.7.4 Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4274.htm (2 of 3) [5/1/2006 9:58:57 AM]
Trang 23location.
3 Generate the sample standard deviation,
a confidence interval for the population
standard deviation, and detect drift in
variation by dividing the data into
quarters and computing Levene's test for
equal standard deviations
4 Check for randomness by generating an
autocorrelation plot and a runs test
5 Print a univariate report (this assumes
steps 2 thru 5 have already been run)
location since the slope parameter estimate is statistically significant
3 The standard deviation is 0.0635 with
a 95% confidence interval of (0.060829,0.066407) Levene's test indicates significant
change in variation
4 The lag 1 autocorrelation is 0.97
From the autocorrelation plot, this is outside the 95% confidence interval bands, indicating significant non-randomness
5 The results are summarized in a convenient report
1.4.2.7.4 Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4274.htm (3 of 3) [5/1/2006 9:58:57 AM]