Scatter Plots Allow Comparison of Random Variation Across Data Similar to their use in checking the sufficiency of the functional form of the model, scatter plots of the residuals are al
Trang 24 Process Modeling
4.4 Data Analysis for Process Modeling
4.4.4 How can I tell if a model fits my data?
4.4.4.2 How can I detect non-constant variation across
the data?
Scatter Plots
Allow
Comparison
of Random
Variation
Across Data
Similar to their use in checking the sufficiency of the functional form of the model, scatter plots
of the residuals are also used to check the assumption of constant standard deviation of random errors Scatter plots of the residuals versus the explanatory variables and versus the predicted values from the model allow comparison of the amount of random variation in different parts of the data For example, the plot below shows residuals from a straight-line fit to the
Pressure/Temperature data In this plot the range of the residuals looks essentially constant across the levels of the predictor variable, temperature The scatter in the residuals at temperatures between 20 and 30 degrees is similar to the scatter in the residuals between 40 and 50 degrees and between 55 and 70 degrees This suggests that the standard deviation of the random errors is the same for the responses observed at each temperature
Residuals
from Pressure
/ Temperature
Example
4.4.4.2 How can I detect non-constant variation across the data?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd442.htm (1 of 6) [5/1/2006 10:22:13 AM]
Trang 3of Example
To illustrate how the residuals from the Pressure/Temperature data would look if the standard deviation was not constant across the different temperature levels, a modified version of the data was simulated In the modified version, the standard deviation increases with increasing values of pressure Situations like this, in which the standard deviation increases with increasing values of the response, are among the most common ways that non-constant random variation occurs in physical science and engineering applications A plot of the data is shown below Comparison of these two versions of the data is interesting because in the original units of the data they don't look strikingly different
Pressure
Data with
Non-Constant
Residual
Standard
Deviation
Residuals
Indicate
Non-Constant
Standard
Deviation
The residual plot from a straight-line fit to the modified data, however, highlights the non-constant standard deviation in the data The horn-shaped residual plot, starting with residuals close together around 20 degrees and spreading out more widely as the temperature (and the pressure) increases, is a typical plot indicating that the assumptions of the analysis are not satisfied with this model Other residual plot shapes besides the horn shape could indicate non-constant standard deviation as well For example, if the response variable for a data set peaked in the middle of the range of the predictors and was small for extreme values of the predictors, the residuals plotted versus the predictors would look like two horns with the bells facing one another In a case like this, a plot of the residuals versus the predicted values would exhibit the single horn shape, however
4.4.4.2 How can I detect non-constant variation across the data?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd442.htm (2 of 6) [5/1/2006 10:22:13 AM]
Trang 4from Modified
Pressure
Data
Residual
Plots
Comparing
Variability
Apply to Most
Methods
The use of residual plots to check the assumption of constant standard deviation works in the same way for most modeling methods It is not limited to least squares regression even though that is almost always the context in which it is explained The plot below shows the residuals from a LOESS fit to the data from the Thermocouple Calibration example The even spread of the residuals across the range of the data does not indicate any changes in the standard deviation, leading us to the conclusion that this assumption is not unreasonable for these data
Residuals
from LOESS
Fit to
Thermocouple
Calibration
Data
4.4.4.2 How can I detect non-constant variation across the data?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd442.htm (3 of 6) [5/1/2006 10:22:13 AM]
Trang 5Function
Needed to
Check for
Constant
Standard
Deviation
One potential pitfall in using residual plots to check for constant standard deviation across the data is that the functional part of the model must adequately describe the systematic variation in the data If that is not the case, then the typical horn shape observed in the residuals could be due
to an artifact of the function fit to the data rather than to non-constant variation For example, in the Polymer Relaxation example it was hypothesized that both time and temperature are related to the response variable, torque However, if a single stretched exponential model in time was the initial model used for the process, the residual plots could be misinterpreted fairly easily, leading
to the false conclusion that the standard deviation is not constant across the data When the functional part of the model does not fit the data well, the residuals do not reflect purely random variations in the process Instead, they reflect the remaining structure in the data not accounted for by the function Because the residuals are not random, they cannot be used to answer questions about the random part of the model This also emphasizes the importance of plotting the data before fitting the initial model, even if a theoretical model for the data is available Looking
at the data before fitting the initial model, at least in this case, would likely forestall this potential problem
4.4.4.2 How can I detect non-constant variation across the data?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd442.htm (4 of 6) [5/1/2006 10:22:13 AM]
Trang 6Relaxation
Data Modeled
as a Single
Stretched
Exponential
Residuals
from Single
Stretched
Exponential
Model
4.4.4.2 How can I detect non-constant variation across the data?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd442.htm (5 of 6) [5/1/2006 10:22:13 AM]
Trang 7Getting Back
on Course
After a Bad
Start
Fortunately, even if the initial model were incorrect, and the residual plot above was made, there are clues in this plot that indicate that the horn shape (pointing left this time) is not caused by non-constant standard deviation The cluster of residuals at time zero that have a residual torque near one indicate that the functional part of the model does not fit the data In addition, even when the residuals occur with equal frequency above and below zero, the spacing of the residuals at each time does not really look random The spacing is too regular to represent random
measurement errors At measurement times near the low end of the scale, the spacing of the points increases as the residuals decrease and at the upper end of the scale the spacing decreases
as the residuals decrease The patterns in the spacing of the residuals also points to the fact that the functional form of the model is not correct and needs to be corrected before drawing conclusions about the distribution of the residuals
4.4.4.2 How can I detect non-constant variation across the data?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd442.htm (6 of 6) [5/1/2006 10:22:13 AM]
Trang 8No Drift
Indicated
Taken as a whole, this plot essentially shows that there is only random scatter in the relationship between the observed pressures and order in which the data were collected, rather than any systematic relationship Although there appears to be a slight trend in the residuals when plotted
in run order, the trend is small when measured against short-term random variation in the data, indicating that it is probably not a real effect The presence of this apparent trend does emphasize, however, that practice and judgment are needed to correctly interpret these plots Although
residual plots are a very useful tool, if critical judgment is not used in their interpretation, you can see things that aren't there or miss things that are One hint that the slight slope visible in the data
is not worrisome in this case is the fact that the residuals overlap zero across all runs If the process was drifting significantly, it is likely that there would be some parts of the run sequence
in which the residuals would not overlap zero If there is still some doubt about the slight trend visible in the data after using this graphical procedure, a term describing the drift can be added to the model and tested numerically to see if it has a significant impact on the results
Modification
of Example
To illustrate how the residuals from the Pressure/Temperature data would look if there were drift
in the process, a modified version of the data was simulated A small drift of 0.3 units/measurement was added to the process A plot of the data is shown below In this run sequence plot a clear, strong trend is visible and there are portions of the run order where the residuals do not overlap zero Because the structure is so evident in this case, it is easy to conclude that some sort of drift is present Then, of course, its cause needs to be determined so that appropriate steps can be taken to eliminate the drift from the process or to account for it in the model
4.4.4.3 How can I tell if there was drift in the measurement process?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd443.htm (2 of 4) [5/1/2006 10:22:14 AM]
Trang 9Sequence
Plot for
Pressure /
Temperature
Data with
Drift
As in the case when the standard deviation was not constant across the data set, comparison of these two versions of the data is interesting because the drift is not apparent in either data set when viewed in the scale of the data This highlights the need for graphical residual analysis when developing process models
Applicable
to Most
Regression
Methods
The run sequence plot, like most types of residual plots, can be used to check for drift in many regression methods It is not limited to least squares fitting or one particular type of model The run sequence plot below shows the residuals from the fit of the nonlinear model
to the data from the Polymer Relaxation example The even spread of the residuals across the range of the data indicates that there is no apparent drift in this process
4.4.4.3 How can I tell if there was drift in the measurement process?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd443.htm (3 of 4) [5/1/2006 10:22:14 AM]
Trang 10Sequence
Plot for
Polymer
Relaxation
Data
4.4.4.3 How can I tell if there was drift in the measurement process?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd443.htm (4 of 4) [5/1/2006 10:22:14 AM]
Trang 11Lag Plot:
Thermocouple
Calibration
Example
4.4.4.4 How can I assess whether the random errors are independent from one to the next?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd444.htm (2 of 4) [5/1/2006 10:22:14 AM]
Trang 12Lag Plot:
Polymer
Relaxation
Example
4.4.4.4 How can I assess whether the random errors are independent from one to the next?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd444.htm (3 of 4) [5/1/2006 10:22:14 AM]
Trang 13Next Steps Some of the different patterns that might be found in the residuals when the errors are not
independent are illustrated in the general discussion of the lag plot If the residuals are not random, then time series methods might be required to fully model the data Some time series basics are given in Section 4 of the chapter on Process Monitoring Before jumping to
conclusions about the need for time series methods, however, be sure that a run order plot does not show any trends, or other structure, in the data If there is a trend in the run order plot, whether caused by drift or by the use of the wrong functional form, the source of the structure shown in the run order plot will also induce structure in the lag plot Structure induced in the lag plot in this way does not necessarily indicate dependence in successive random errors The lag plot can only be interpreted clearly after accounting for any structure in the run order plot 4.4.4.4 How can I assess whether the random errors are independent from one to the next?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd444.htm (4 of 4) [5/1/2006 10:22:14 AM]
Trang 14Probability
Plot:
Thermocouple
Calibration
Example
4.4.4.5 How can I test whether or not the random errors are distributed normally?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd445.htm (2 of 7) [5/1/2006 10:22:15 AM]
Trang 15Probability
Plot: Polymer
Relaxation
Example
4.4.4.5 How can I test whether or not the random errors are distributed normally?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd445.htm (3 of 7) [5/1/2006 10:22:15 AM]
Trang 16Discussion
and Examples
If the random errors from one of these processes were not normally distributed, then significant curvature may have been visible in the relationship between the residuals and the quantiles from the standard normal distribution, or there would be residuals at the upper and/or lower ends of the line that clearly did not fit the linear relationship followed by the bulk of the data Examples of some typical cases obtained with non-normal random errors are illustrated in the general discussion of the normal probability plot
Histogram The normal probability plot helps us determine whether or not it is reasonable to assume that the
random errors in a statistical process can be assumed to be drawn from a normal distribution An advantage of the normal probability plot is that the human eye is very sensitive to deviations from
a straight line that might indicate that the errors come from a non-normal distribution However, when the normal probability plot suggests that the normality assumption may not be reasonable, it does not give us a very good idea what the distribution does look like A histogram of the
residuals from the fit, on the other hand, can provide a clearer picture of the shape of the distribution The fact that the histogram provides more general distributional information than does the normal probability plot suggests that it will be harder to discern deviations from normality than with the more specifically-oriented normal probability plot
Examples Histograms for the three examples used to illustrate the normal probability plot are shown below
The histograms are all more-or-less bell-shaped, confirming the conclusions from the normal probability plots Additional examples can be found in the gallery of graphical techniques 4.4.4.5 How can I test whether or not the random errors are distributed normally?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd445.htm (4 of 7) [5/1/2006 10:22:15 AM]
Trang 17Temperature /
Pressure
Example
Histogram:
Thermocouple
Calibration
Example
4.4.4.5 How can I test whether or not the random errors are distributed normally?
http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd445.htm (5 of 7) [5/1/2006 10:22:15 AM]