Engineering Statistics Handbook Episode 5 Part 11 docx

Scatter Plots Allow Comparison of Random Variation Across Data Similar to their use in checking the sufficiency of the functional form of the model, scatter plots of the residuals are al

Trang 2

4 Process Modeling

4.4 Data Analysis for Process Modeling

4.4.4 How can I tell if a model fits my data?

4.4.4.2 How can I detect non-constant variation across

the data?

Scatter Plots

Allow

Comparison

of Random

Variation

Across Data

Similar to their use in checking the sufficiency of the functional form of the model, scatter plots

of the residuals are also used to check the assumption of constant standard deviation of random errors Scatter plots of the residuals versus the explanatory variables and versus the predicted values from the model allow comparison of the amount of random variation in different parts of the data For example, the plot below shows residuals from a straight-line fit to the

Pressure/Temperature data In this plot the range of the residuals looks essentially constant across the levels of the predictor variable, temperature The scatter in the residuals at temperatures between 20 and 30 degrees is similar to the scatter in the residuals between 40 and 50 degrees and between 55 and 70 degrees This suggests that the standard deviation of the random errors is the same for the responses observed at each temperature

Residuals

from Pressure

/ Temperature

Example

4.4.4.2 How can I detect non-constant variation across the data?

http://www.itl.nist.gov/div898/handbook/pmd/section4/pmd442.htm (1 of 6) [5/1/2006 10:22:13 AM]

Trang 3

of Example

To illustrate how the residuals from the Pressure/Temperature data would look if the standard deviation was not constant across the different temperature levels, a modified version of the data was simulated In the modified version, the standard deviation increases with increasing values of pressure Situations like this, in which the standard deviation increases with increasing values of the response, are among the most common ways that non-constant random variation occurs in physical science and engineering applications A plot of the data is shown below Comparison of these two versions of the data is interesting because in the original units of the data they don't look strikingly different

Pressure

Data with

Non-Constant

Residual

Standard

Deviation

Residuals

Indicate

Non-Constant

Standard

Deviation

The residual plot from a straight-line fit to the modified data, however, highlights the non-constant standard deviation in the data The horn-shaped residual plot, starting with residuals close together around 20 degrees and spreading out more widely as the temperature (and the pressure) increases, is a typical plot indicating that the assumptions of the analysis are not satisfied with this model Other residual plot shapes besides the horn shape could indicate non-constant standard deviation as well For example, if the response variable for a data set peaked in the middle of the range of the predictors and was small for extreme values of the predictors, the residuals plotted versus the predictors would look like two horns with the bells facing one another In a case like this, a plot of the residuals versus the predicted values would exhibit the single horn shape, however

Trang 4

from Modified

Pressure

Data

Residual

Plots

Comparing

Variability

Apply to Most

Methods

The use of residual plots to check the assumption of constant standard deviation works in the same way for most modeling methods It is not limited to least squares regression even though that is almost always the context in which it is explained The plot below shows the residuals from a LOESS fit to the data from the Thermocouple Calibration example The even spread of the residuals across the range of the data does not indicate any changes in the standard deviation, leading us to the conclusion that this assumption is not unreasonable for these data

Residuals

from LOESS

Fit to

Thermocouple

Calibration

Data

Trang 5

Function

Needed to

Check for

Constant

Standard

Deviation

One potential pitfall in using residual plots to check for constant standard deviation across the data is that the functional part of the model must adequately describe the systematic variation in the data If that is not the case, then the typical horn shape observed in the residuals could be due

to an artifact of the function fit to the data rather than to non-constant variation For example, in the Polymer Relaxation example it was hypothesized that both time and temperature are related to the response variable, torque However, if a single stretched exponential model in time was the initial model used for the process, the residual plots could be misinterpreted fairly easily, leading

to the false conclusion that the standard deviation is not constant across the data When the functional part of the model does not fit the data well, the residuals do not reflect purely random variations in the process Instead, they reflect the remaining structure in the data not accounted for by the function Because the residuals are not random, they cannot be used to answer questions about the random part of the model This also emphasizes the importance of plotting the data before fitting the initial model, even if a theoretical model for the data is available Looking

at the data before fitting the initial model, at least in this case, would likely forestall this potential problem

Trang 6

Relaxation

Data Modeled

as a Single

Stretched

Exponential

Residuals

from Single

Stretched

Exponential

Model

Trang 7

Getting Back

on Course

After a Bad

Start

Fortunately, even if the initial model were incorrect, and the residual plot above was made, there are clues in this plot that indicate that the horn shape (pointing left this time) is not caused by non-constant standard deviation The cluster of residuals at time zero that have a residual torque near one indicate that the functional part of the model does not fit the data In addition, even when the residuals occur with equal frequency above and below zero, the spacing of the residuals at each time does not really look random The spacing is too regular to represent random

measurement errors At measurement times near the low end of the scale, the spacing of the points increases as the residuals decrease and at the upper end of the scale the spacing decreases

as the residuals decrease The patterns in the spacing of the residuals also points to the fact that the functional form of the model is not correct and needs to be corrected before drawing conclusions about the distribution of the residuals

Trang 8

No Drift

Indicated

Taken as a whole, this plot essentially shows that there is only random scatter in the relationship between the observed pressures and order in which the data were collected, rather than any systematic relationship Although there appears to be a slight trend in the residuals when plotted

in run order, the trend is small when measured against short-term random variation in the data, indicating that it is probably not a real effect The presence of this apparent trend does emphasize, however, that practice and judgment are needed to correctly interpret these plots Although

residual plots are a very useful tool, if critical judgment is not used in their interpretation, you can see things that aren't there or miss things that are One hint that the slight slope visible in the data

is not worrisome in this case is the fact that the residuals overlap zero across all runs If the process was drifting significantly, it is likely that there would be some parts of the run sequence

in which the residuals would not overlap zero If there is still some doubt about the slight trend visible in the data after using this graphical procedure, a term describing the drift can be added to the model and tested numerically to see if it has a significant impact on the results

Modification

of Example

To illustrate how the residuals from the Pressure/Temperature data would look if there were drift

in the process, a modified version of the data was simulated A small drift of 0.3 units/measurement was added to the process A plot of the data is shown below In this run sequence plot a clear, strong trend is visible and there are portions of the run order where the residuals do not overlap zero Because the structure is so evident in this case, it is easy to conclude that some sort of drift is present Then, of course, its cause needs to be determined so that appropriate steps can be taken to eliminate the drift from the process or to account for it in the model

4.4.4.3 How can I tell if there was drift in the measurement process?

Trang 9

Sequence

Plot for

Pressure /

Temperature

Data with

Drift

As in the case when the standard deviation was not constant across the data set, comparison of these two versions of the data is interesting because the drift is not apparent in either data set when viewed in the scale of the data This highlights the need for graphical residual analysis when developing process models

Applicable

to Most

Regression

Methods

The run sequence plot, like most types of residual plots, can be used to check for drift in many regression methods It is not limited to least squares fitting or one particular type of model The run sequence plot below shows the residuals from the fit of the nonlinear model

to the data from the Polymer Relaxation example The even spread of the residuals across the range of the data indicates that there is no apparent drift in this process

Trang 10

Sequence

Plot for

Polymer

Relaxation

Data

Trang 11

Lag Plot:

Thermocouple

Calibration

Example

4.4.4.4 How can I assess whether the random errors are independent from one to the next?

Trang 12

Lag Plot:

Polymer

Relaxation

Example

4.4.4.4 How can I assess whether the random errors are independent from one to the next?

Trang 13

Next Steps Some of the different patterns that might be found in the residuals when the errors are not

independent are illustrated in the general discussion of the lag plot If the residuals are not random, then time series methods might be required to fully model the data Some time series basics are given in Section 4 of the chapter on Process Monitoring Before jumping to

conclusions about the need for time series methods, however, be sure that a run order plot does not show any trends, or other structure, in the data If there is a trend in the run order plot, whether caused by drift or by the use of the wrong functional form, the source of the structure shown in the run order plot will also induce structure in the lag plot Structure induced in the lag plot in this way does not necessarily indicate dependence in successive random errors The lag plot can only be interpreted clearly after accounting for any structure in the run order plot 4.4.4.4 How can I assess whether the random errors are independent from one to the next?

Trang 14

Probability

Plot:

Thermocouple

Calibration

Example

4.4.4.5 How can I test whether or not the random errors are distributed normally?

Trang 15

Probability

Plot: Polymer

Relaxation

Example

Trang 16

Discussion

and Examples

If the random errors from one of these processes were not normally distributed, then significant curvature may have been visible in the relationship between the residuals and the quantiles from the standard normal distribution, or there would be residuals at the upper and/or lower ends of the line that clearly did not fit the linear relationship followed by the bulk of the data Examples of some typical cases obtained with non-normal random errors are illustrated in the general discussion of the normal probability plot

Histogram The normal probability plot helps us determine whether or not it is reasonable to assume that the

random errors in a statistical process can be assumed to be drawn from a normal distribution An advantage of the normal probability plot is that the human eye is very sensitive to deviations from

a straight line that might indicate that the errors come from a non-normal distribution However, when the normal probability plot suggests that the normality assumption may not be reasonable, it does not give us a very good idea what the distribution does look like A histogram of the

residuals from the fit, on the other hand, can provide a clearer picture of the shape of the distribution The fact that the histogram provides more general distributional information than does the normal probability plot suggests that it will be harder to discern deviations from normality than with the more specifically-oriented normal probability plot

Examples Histograms for the three examples used to illustrate the normal probability plot are shown below

The histograms are all more-or-less bell-shaped, confirming the conclusions from the normal probability plots Additional examples can be found in the gallery of graphical techniques 4.4.4.5 How can I test whether or not the random errors are distributed normally?

Trang 17

Temperature /

Pressure

Example

Histogram:

Thermocouple

Calibration

Example

Định dạng
Số trang	17
Dung lượng	139,12 KB