Demodulation Amplitude Plot The complex demodulation amplitude plot for this data shows that: The amplitude is fixed at approximately 390.. Interpretation The assumptions are addressed b
Trang 1STATISTIC = NUMBER OF RUNS DOWN
NUMBER OF NEGATIVE DIFFERENCES = 241 NUMBER OF ZERO DIFFERENCES = 0
1.4.2.5.2 Test Underlying Assumptions
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4252.htm (8 of 9) [5/1/2006 9:58:51 AM]
Trang 2Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level Numerous values in this column are much larger than +/-1.96,
so we conclude that the data are not random.
Distributional
Assumptions
Since the quantitative tests show that the assumptions of constant scale and non-randomness are not met, the distributional measures will not be meaningful.
Therefore these quantitative tests are omitted.
1.4.2.5.2 Test Underlying Assumptions
Trang 31 Exploratory Data Analysis
1.4 EDA Case Studies
To obtain a good fit, sinusoidal models require good starting values for C, the
amplitude, and the frequency.
Good Starting
Value for C
A good starting value for C can be obtained by calculating the mean of the data.
If the data show a trend, i.e., the assumption of constant location is violated, we
can replace C with a linear or quadratic least squares fit That is, the model becomes
or
Since our data did not have any meaningful change of location, we can fit the
simpler model with C equal to the mean From the summary output in the
previous page, the mean is -177.44.
Trang 4We could generate the demodulation phase plot for 0.3 and then use trial and error to obtain a better estimate for the frequency To simplify this, we generate
16 of these plots on a single page starting with a frequency of 0.28, increasing in increments of 0.0025, and stopping at 0.3175.
Interpretation The plots start with lines sloping from left to right but gradually change to a right
to left slope The relatively flat slope occurs for frequency 0.3025 (third row, second column) The complex demodulation phase plot restricts the range from
to This is why the plot appears to show some breaks.
That is, we replace with a function of time A linear fit is specified in the model above, but this can be replaced with a more elaborate function if needed.1.4.2.5.3 Develop a Better Model
Trang 5Demodulation
Amplitude
Plot
The complex demodulation amplitude plot for this data shows that:
The amplitude is fixed at approximately 390.
Fit Output Using starting estimates of 0.3025 for the frequency, 390 for the amplitude, and
-177.44 for C, Dataplot generated the following output for the fit.
LEAST SQUARES NON-LINEAR FIT
SAMPLE SIZE N = 200
MODEL Y =C + AMP*SIN(2*3.14159*FREQ*T + PHASE)
NO REPLICATION CASE
ITERATION CONVERGENCE RESIDUAL * PARAMETER
NUMBER MEASURE STANDARD * ESTIMATES
4 0.96108E-01 0.15585E+03 *-0.17879E+03-0.36177E+03
1.4.2.5.3 Develop a Better Model
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4253.htm (3 of 4) [5/1/2006 9:58:52 AM]
Trang 62 AMP -361.766 ( 26.19 ) -13.81
3 FREQ 0.302596 (0.1510E-03) 2005
4 PHASE 1.46536 (0.4909E-01) 29.85
RESIDUAL STANDARD DEVIATION = 155.8484
RESIDUAL DEGREES OF FREEDOM = 196
Model From the fit output, our proposed model is:
We will evaluate the adequacy of this model in the next section.
1.4.2.5.3 Develop a Better Model
Trang 71 Exploratory Data Analysis
1.4 EDA Case Studies
Trang 8Interpretation The assumptions are addressed by the graphics shown above:
The run sequence plot (upper left) indicates that the data do not have any significant shifts in location There does seem to be some shifts in scale A start-up effect was detected previously by the complex demodulation amplitude plot There does appear to
The histogram (lower left) and the normal probability plot
(lower right) do not show any serious non-normality in the residuals However, the bend in the left portion of the normal probability plot shows some cause for concern.
Dataplot generated the following fit output after removing 3 outliers.
LEAST SQUARES NON-LINEAR FIT
SAMPLE SIZE N = 197
MODEL Y =C + AMP*SIN(2*3.14159*FREQ*T + PHASE)
NO REPLICATION CASE
ITERATION CONVERGENCE RESIDUAL * PARAMETER
NUMBER MEASURE STANDARD * ESTIMATES
2 AMP -361.759 ( 25.45 ) -14.22
3 FREQ 0.302597 (0.1457E-03) 2077.
4 PHASE 1.46533 (0.4715E-01) 31.08
RESIDUAL STANDARD DEVIATION = 148.3398
1.4.2.5.4 Validate New Model
Trang 9RESIDUAL DEGREES OF FREEDOM = 193
New
Fit to
Edited
Data
The original fit, with a residual standard deviation of 155.84, was:
The new fit, with a residual standard deviation of 148.34, is:
There is minimal change in the parameter estimates and about a 5% reduction in the residual standard deviation In this case, removing the residuals has a modest benefit in terms of reducing the variability of the model.
Trang 101 Exploratory Data Analysis
1.4 EDA Case Studies
Click on the links below to start Dataplot and run this case
study yourself Each step may use results from previous steps,
so please be patient Wait until the software verifies that the
current step is complete before clicking on the next step.
The links in this column will connect you with more detailed information about each analysis step from the case study description.
1 Invoke Dataplot and read data
1 Read in the data
2 Generate a run sequence plot
3 Generate a lag plot
4 Generate an autocorrelation plot
1 Based on the 4-plot, there are no obvious shifts in location and scale, but the data are not random
2 Based on the run sequence plot, there are no obvious shifts in location and scale
3 Based on the lag plot, the data are not random
4 The autocorrelation plot shows significant autocorrelation at lag 1
5 The spectral plot shows a single dominant1.4.2.5.5 Work This Example Yourself
Trang 115 Generate a spectral plot.
6 Generate a table of summary
statistics
7 Generate a linear fit to detect
drift in location
8 Detect drift in variation by
dividing the data into quarters and
computing Levene's test statistic for
equal standard deviations
9 Check for randomness by generating
a runs test
low frequency peak
6 The summary statistics table displays 25+ statistics
7 The linear fit indicates no drift in location since the slope parameter
is not statistically significant
8 Levene's test indicates no significant drift in variation
9 The runs test indicates significant non-randomness
3 Fit the non-linear model
1 Complex demodulation phase plot indicates a starting frequency
1.4.2.5.5 Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4255.htm (2 of 3) [5/1/2006 9:58:53 AM]
Trang 124 Validate fit.
1 Generate a 4-plot of the residuals
from the fit
2 Generate a nonlinear fit with
outliers removed
3 Generate a 4-plot of the residuals
from the fit with the outliers
removed
1 The 4-plot indicates that the assumptions
of constant location and scale are valid The lag plot indicates that the data are random The histogram and normal
probability plot indicate that the residuals that the normality assumption for the
residuals are not seriously violated, although there is a bend on the probablity plot that warrants attention
2 The fit after removing 3 outliers shows some marginal improvement in the model (a 5% reduction in the residual standard deviation)
3 The 4-plot of the model fit after
3 outliers removed shows marginal improvement in satisfying model assumptions
1.4.2.5.5 Work This Example Yourself
Trang 131 Exploratory Data Analysis
1.4 EDA Case Studies
Trang 141 Exploratory Data Analysis
1.4 EDA Case Studies
1.4.2 Case Studies
1.4.2.6 Filter Transmittance
1.4.2.6.1 Background and Data
Generation This data set was collected by NIST chemist Radu Mavrodineaunu in
the 1970's from an automatic data acquisition system for a filter transmittance experiment The response variable is transmittance The motivation for studying this data set is to show how the underlying autocorrelation structure in a relatively small data set helped the
scientist detect problems with his automatic data acquisition system This file can be read by Dataplot with the following commands:
SKIP 25 READ MAVRO.DAT Y
Resulting
Data
The following are the data used for this case study.
2.00180 2.00170 2.00180 2.00190 2.00180 2.00170 2.00150 2.00140 2.00150 2.00150 2.00170 2.00180 2.00180 2.00190 2.00190 2.00210 2.00200 2.00160 2.00140
1.4.2.6.1 Background and Data
Trang 152.00130 2.00130 2.00150 2.00150 2.00160 2.00150 2.00140 2.00130 2.00140 2.00150 2.00140 2.00150 2.00160 2.00150 2.00160 2.00190 2.00200 2.00200 2.00210 2.00220 2.00230 2.00240 2.00250 2.00270 2.00260 2.00260 2.00260 2.00270 2.00260 2.00250 2.00240
1.4.2.6.1 Background and Data
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4261.htm (2 of 2) [5/1/2006 9:58:53 AM]
Trang 161 Exploratory Data Analysis
1.4 EDA Case Studies
1.4.2 Case Studies
1.4.2.6 Filter Transmittance
1.4.2.6.2 Graphical Output and
Interpretation
Determine if the univariate model:
is appropriate and valid.
Determine if the confidence interval
is appropriate and valid where s is the standard deviation of the
original data.
3
1.4.2.6.2 Graphical Output and Interpretation
Trang 174-Plot of
Data
Interpretation The assumptions are addressed by the graphics shown above:
The run sequence plot (upper left) indicates a significant shift in location around x=35.
measurement The solution was to rerun the experiment allowing more time between samples.
1.4.2.6.2 Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4262.htm (2 of 4) [5/1/2006 9:58:53 AM]
Trang 18Simple graphical techniques can be quite effective in revealing unexpected results in the data When this occurs, it is important to investigate whether the unexpected result is due to problems in the experiment and data collection or is indicative of unexpected underlying structure in the data This determination cannot be made on the basis of statistics alone The role of the graphical and statistical analysis is to detect problems or unexpected results in the data.
Resolving the issues requires the knowledge of the scientist or engineer.
Individual
Plots
Although it is generally unnecessary, the plots can be generated individually to give more detail Since the lag plot indicates significant non-randomness, we omit the distributional plots.
Trang 191.4.2.6.2 Graphical Output and Interpretation
http://www.itl.nist.gov/div898/handbook/eda/section4/eda4262.htm (4 of 4) [5/1/2006 9:58:53 AM]
Trang 201 Exploratory Data Analysis
1.4 EDA Case Studies
Trang 21Location One way to quantify a change in location over time is to fit a straight line to the data set
using the index variable X = 1, 2, , N, with N denoting the number of observations If there is no significant drift in the location, the slope parameter should be zero For this data set, Dataplot generates the following output:
LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 50 NUMBER OF VARIABLES = 1
NO REPLICATION CASE
PARAMETER ESTIMATES (APPROX ST DEV.) TVALUE
1 A0 2.00138 (0.9695E-04) 0.2064E+05
2 A1 X 0.184685E-04 (0.3309E-05) 5.582
RESIDUAL STANDARD DEVIATION = 0.3376404E-03 RESIDUAL DEGREES OF FREEDOM = 48
The slope parameter, A1, has a t value of 5.6, which is statistically significant The value
of the slope parameter is 0.0000185 Although this number is nearly zero, we need to take into account that the original scale of the data is from about 2.0012 to 2.0028 In this case, we conclude that there is a drift in location, although by a relatively minor amount.
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the
data set into several equal sized intervals However, the Bartlett test is not robust for non-normality Since the normality assumption is questionable for these data, we use the alternative Levene test In partiuclar, we use the Levene test based on the median rather the mean The choice of the number of intervals is somewhat arbitrary, although values of
4 or 8 are reasonable Dataplot generated the following output for the Levene test.
LEVENE F-TEST FOR SHIFT IN VARIATION (ASSUMPTION: NORMALITY)
1 STATISTICS NUMBER OF OBSERVATIONS = 50 NUMBER OF GROUPS = 4 LEVENE F TEST STATISTIC = 0.9714893
FOR LEVENE TEST STATISTIC
Trang 2299 % POINT = 4.238307 99.9 % POINT = 6.424733
58.56597 % Point: 0.9714893
3 CONCLUSION (AT THE 5% LEVEL):
THERE IS NO SHIFT IN VARIATION
THUS: HOMOGENEOUS WITH RESPECT TO VARIATION
In this case, since the Levene test statistic value of 0.971 is less than the critical value of 2.806 at the 5% level, we conclude that there is no evidence of a change in variation.
Randomness There are many ways in which data can be non-random However, most common forms
of non-randomness can be detected with a few simple tests The lag plot in the 4-plot in the previous seciton is a simple graphical technique.
One check is an autocorrelation plot that shows the autocorrelations for various lags Confidence bands can be plotted at the 95% and 99% confidence levels Points outside this band indicate statistically significant values (lag 0 is always 1) Dataplot generated the following autocorrelation plot.
The lag 1 autocorrelation, which is generally the one of most interest, is 0.93 The critical values at the 5% level are -0.277 and 0.277 This indicates that the lag 1 autocorrelation
is statistically significant, so there is strong evidence of non-randomness.
A common test for randomness is the runs test
RUNS UP STATISTIC = NUMBER OF RUNS UP
OF LENGTH EXACTLY I
I STAT EXP(STAT) SD(STAT) Z
1.4.2.6.3 Quantitative Output and Interpretation
Trang 24NUMBER OF NEGATIVE DIFFERENCES = 18 NUMBER OF ZERO DIFFERENCES = 8
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level Due to the number of values that are much larger than the 1.96 cut-off, we conclude that the data are not random.
Distributional
Analysis
Since we rejected the randomness assumption, the distributional tests are not meaningful Therefore, these quantitative tests are omitted We also omit Grubbs' outlier test since it also assumes the data are approximately normally distributed.
1.4.2.6.3 Quantitative Output and Interpretation