Dataplot Output LEAST SQUARES POLYNOMIAL FIT SAMPLE SIZE N = 40 DEGREE = 2 REPLICATION CASE REPLICATION STANDARD DEVIATION = 0.2147264895D-03 REPLICATION DEGREES OF FREEDOM = 20 NUMBER O
Trang 14 Process Modeling
4.6 Case Studies in Process Modeling
4.6.1 Load Cell Calibration
4.6.1.9 Interpretation of Numerical Output
to reject the hypothesis that the quadratic model is correct.
Dataplot
Output LEAST SQUARES POLYNOMIAL FIT
SAMPLE SIZE N = 40 DEGREE = 2 REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.2147264895D-03 REPLICATION DEGREES OF FREEDOM = 20
NUMBER OF DISTINCT SUBSETS = 20
PARAMETER ESTIMATES (APPROX ST DEV.) T VALUE
Trang 2All of the parameters are significantly different from zero, as indicated by the
associated t statistics The 97.5% cut-off for the t distribution with 37 degrees of freedom is 2.026 Since all of the t values are well above this cut-off, we can safely
conclude that none of the estimated parameters is equal to zero.
4.6.1.9 Interpretation of Numerical Output - Model #2
Trang 34 Process Modeling
4.6 Case Studies in Process Modeling
4.6.1 Load Cell Calibration
4.6.1.10 Use of the Model for Calibration
Using the
Model
Now that a good model has been found for these data, it can be used to estimate load values fornew measurements of deflection For example, suppose a new deflection value of 1.239722 isobserved The regression function can be solved for load to determine an estimated load valuewithout having to observe it directly The plot below illustrates the calibration process
4.6.1.10 Use of the Model for Calibration
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd61a.htm (1 of 3) [5/1/2006 10:22:37 AM]
Trang 4solutions As we saw from the plot on the previous page, however, there is really no confusionover which root of the quadratic function is the correct load Essentially, the load value must bebetween 150,000 and 3,000,000 for this problem The other root of the regression equation andthe new deflection value correspond to a load of over 229,899,600 Looking at the data at hand, it
is safe to assume that a load of 229,899,600 would yield a deflection much greater than 1.24
+/- What? The final step in the calibration process, after determining the estimated load associated with the
observed deflection, is to compute an uncertainty or confidence interval for the load A single-use95% confidence interval for the load, is obtained by inverting the formulas for the upper andlower bounds of a 95% prediction interval for a new deflection value These inequalities, shownbelow, are usually solved numerically, just as the calibration equation was, to find the end points
of the confidence interval For some models, including this one, the solution could actually beobtained algebraically, but it is easier to let the computer do the work using a generic algorithm
The three terms on the right-hand side of each inequality are the regression function ( ), at-distribution multiplier, and the standard deviation of a new measurement from the process ( ).Regression software often provides convenient methods for computing these quantities forarbitrary values of the predictor variables, which can make computation of the confidence intervalend points easier Although this interval is not symmetric mathematically, the asymmetry is verysmall, so for all practical purposes, the interval can be written as
4.6.1.10 Use of the Model for Calibration
Trang 5if desired.
4.6.1.10 Use of the Model for Calibration
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd61a.htm (3 of 3) [5/1/2006 10:22:37 AM]
Trang 64 Process Modeling
4.6 Case Studies in Process Modeling
4.6.1 Load Cell Calibration
4.6.1.11 Work This Example Yourself
downloaded and installed it Output from each analysis step below will
be displayed in one or more of the Dataplot windows The four mainwindows are the Output window, the Graphics window, the CommandHistory window and the Data Sheet window Across the top of the mainwindows there are menus for executing Dataplot commands Across thebottom is a command entry window where commands can be typed in
Data Analysis Steps Results and Conclusions
Click on the links below to start Dataplot and run this
case study yourself Each step may use results from
previous steps, so please be patient Wait until the
software verifies that the current step is complete
before clicking on the next step.
The links in this column will connect you with more detailed information about each analysis step from the case study description.
1 Get set up and started
1 Read in the data
1 You have read 2 columns of numbers into Dataplot, variables Deflection and Load
2 Fit and validate initial model
1 Plot deflection vs load
2 Fit a straight-line model
to the data
3 Plot the predicted values
1 Based on the plot, a straight-line model should describe the data well
2 The straight-line fit was carried out Before trying to interpret the numerical output, do a graphical residual analysis
3 The superposition of the predicted 4.6.1.11 Work This Example Yourself
Trang 7from the model and the
data on the same plot
4 Plot the residuals vs
7 Refer to the numerical output
from the fit
and observed values suggests the model is ok
4 The residuals are not random, indicating that a straight line
3 Fit and validate refined model
1 Refer to the plot of the
residuals vs load
2 Fit a quadratic model to
the data
3 Plot the predicted values
from the model and the
data on the same plot
4 Plot the residuals vs load
5 Plot the residuals vs the
predicted values
6 Do a 4-plot of the
residuals
7 Refer to the numerical
output from the fit
1 The structure in the plot indicates
a quadratic model would better describe the data
2 The quadratic fit was carried out Remember to do the graphical
residual analysis before trying to interpret the numerical output
3 The superposition of the predicted and observed values again suggests the model is ok
4 The residuals appear random, suggesting the quadratic model is ok
5 The plot of the residuals vs the predicted values also suggests the quadratic model is ok
6 None of these plots indicates a problem with the model
7 The small lack-of-fit F statistic (<1) confirms that the quadratic model fits the data
4.6.1.11 Work This Example Yourself
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd61b.htm (2 of 3) [5/1/2006 10:22:37 AM]
Trang 84 Use the model to make a calibrated
3 Compute the uncertainty of
the load estimate
1 The new deflection is associated with
an unobserved and unknown load
2 Solving the calibration equation yields the load value without having
to observe it
3 Computing a confidence interval for the load value lets us judge the range of plausible load values, since we know measurement noise affects the process
4.6.1.11 Work This Example Yourself
Trang 9Background and Data
Trang 10The data were analyzed to calibrate the bias of the field measurements relative to the laboratory measurements In this analysis, the field measurement is the response variable and the laboratory measurement is the predictor variable.
These data were provided by Harry Berger, who was at the time a scientist for the Office of the Director of the Institute of Materials Research (now the Materials Science and Engineering Laboratory) of NIST These data were used for a study conducted for the Materials Transportation Bureau of the U.S Department of Transportation.
Resulting
Defect Defect Size Size Batch -
Trang 11http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd621.htm (2 of 4) [5/1/2006 10:22:37 AM]
Trang 1315 12.9 6
45 49.0 64.6.2.1 Background and Data
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd621.htm (4 of 4) [5/1/2006 10:22:37 AM]
Trang 14As with any regression problem, it is always a good idea to plot the raw data first The following
is a scatter plot of the raw data
This scatter plot shows that a straight line fit is a good initial candidate model for these data
Plot by Batch These data were collected in six distinct batches The first step in the analysis is to determine if
there is a batch effect
In this case, the scientist was not inherently interested in the batch That is, batch is a nuisancefactor and, if reasonable, we would like to analyze the data as if it came from a single batch.However, we need to know that this is, in fact, a reasonable assumption to make
4.6.2.2 Check for Batch Effect
Trang 15Plot
We first generate a conditional plot where we condition on the batch
This conditional plot shows a scatter plot for each of the six batches on a single page Each ofthese plots shows a similar pattern
4.6.2.2 Check for Batch Effect
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd622.htm (2 of 3) [5/1/2006 10:22:38 AM]
Trang 16The linear correlation plot (upper left), which shows the correlation between field and lab defectsizes versus the batch, indicates that batch six has a somewhat stronger linear relationshipbetween the measurements than the other batches do This is also reflected in the significantlylower residual standard deviation for batch six shown in the residual standard deviation plot(lower right), which shows the residual standard deviation versus batch The slopes all lie within
a range of 0.6 to 0.9 in the linear slope plot (lower left) and the intercepts all lie between 2 and 8
in the linear intercept plot (upper right)
Treat BATCH
as
Homogeneous
These summary plots, in conjunction with the conditional plot above, show that treating the data
as a single batch is a reasonable assumption to make None of the batches behaves badlycompared to the others and none of the batches requires a significantly different fit from theothers
These two plots provide a good pair The plot of the fit statistics allows quick and convenientcomparisons of the overall fits However, the conditional plot can reveal details that may behidden in the summary plots For example, we can more readily determine the existence ofclusters of points and outliers, curvature in the data, and other similar features
Based on these plots we will ignore the BATCH variable for the remaining analysis
4.6.2.2 Check for Batch Effect
Trang 174 Process Modeling
4.6 Case Studies in Process Modeling
4.6.2 Alaska Pipeline
4.6.2.3 Initial Linear Fit
Linear Fit Output Based on the initial plot of the data, we first fit a straight-line model to the data
The following fit output was generated by Dataplot (it has been edited slightly for display)
LEAST SQUARES MULTILINEAR FIT SAMPLE SIZE N = 107 NUMBER OF VARIABLES = 1 REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.6112687111D+01 REPLICATION DEGREES OF FREEDOM = 29
NUMBER OF DISTINCT SUBSETS = 78
PARAMETER ESTIMATES (APPROX ST DEV.) TVALUE
1 A0 4.99368 ( 1.126 ) 4.4
2 A1 LAB 0.731111 (0.2455E-01) 30
RESIDUAL STANDARD DEVIATION = 6.0809240341RESIDUAL DEGREES OF FREEDOM = 105
REPLICATION STANDARD DEVIATION = 6.1126871109REPLICATION DEGREES OF FREEDOM = 29
LACK OF FIT F RATIO = 0.9857 = THE 46.3056% POINT OF THE
F DISTRIBUTION WITH 76 AND 29 DEGREES OF FREEDOM
The intercept parameter is estimated to be 4.99 and the slope parameter is estimated to be 0.73.Both parameters are statistically significant
4.6.2.3 Initial Linear Fit
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd623.htm (1 of 4) [5/1/2006 10:22:39 AM]
Trang 186-Plot for Model
In order to see this more clearly, we will generate full- size plots of the predicted values with thedata and the residuals against the independent variable
Trang 19This plot shows more clearly that the assumption of homogeneous variances for the errors may beviolated.
Trang 20This plot also shows more clearly that the assumption of homogeneous variances is violated Thisassumption, along with the assumption of constant location, are typically easiest to see on thisplot.
Trang 21Transformations In regression modeling, we often apply transformations to achieve the following two goals:
to satisfy the homogeneity of variances assumption for the errors
In examining these plots, we are looking for the plot that shows the most constant variabilityacross the horizontal range of the plot
4.6.2.4 Transformations to Improve Fit and Equalize Variances
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd624.htm (1 of 6) [5/1/2006 10:22:40 AM]
Trang 22This plot indicates that the ln transformation is a good candidate model for achieving the mosthomogeneous variances.
This plot shows that the ln transformation of the predictor variable is a good candidate model
Box-Cox
Linearity Plot
The previous step can be approached more formally by the use of the Box-Cox linearity plot The value on the x axis corresponding to the maximum correlation value on the y axis indicates thepower transformation that yields the most linear fit
4.6.2.4 Transformations to Improve Fit and Equalize Variances
Trang 23This plot indicates that a value of -0.1 achieves the most linear fit.
In practice, for ease of interpretation, we often prefer to use a common transformation, such asthe ln or square root, rather than the value that yields the mathematical maximum However, theBox-Cox linearity plot still indicates whether our choice is a reasonable one That is, we mightsacrifice a small amount of linearity in the fit to have a simpler model
In this case, a value of 0.0 would indicate a ln transformation Although the optimal value fromthe plot is -0.1, the plot indicates that any value between -0.2 and 0.2 will yield fairly similarresults For that reason, we choose to stick with the common ln transformation
ln-ln Fit Based on the above plots, we choose to fit a ln-ln model Dataplot generated the following output
for this model (it is edited slightly for display)
LEAST SQUARES MULTILINEAR FITSAMPLE SIZE N = 107NUMBER OF VARIABLES = 1REPLICATION CASE
REPLICATION STANDARD DEVIATION = 0.1369758099D+00REPLICATION DEGREES OF FREEDOM = 29
NUMBER OF DISTINCT SUBSETS = 78
PARAMETER ESTIMATES (APPROX ST DEV.) TVALUE
1 A0 0.281384 (0.8093E-01)
4.6.2.4 Transformations to Improve Fit and Equalize Variances
http://www.itl.nist.gov/div898/handbook/pmd/section6/pmd624.htm (3 of 6) [5/1/2006 10:22:40 AM]
Trang 242 A1 XTEMP 0.885175 (0.2302E-01) 38
RESIDUAL STANDARD DEVIATION = 0.1682604253RESIDUAL DEGREES OF FREEDOM = 105
REPLICATION STANDARD DEVIATION = 0.1369758099REPLICATION DEGREES OF FREEDOM = 29
LACK OF FIT F RATIO = 1.7032 = THE 94.4923% POINT OFTHE
F DISTRIBUTION WITH 76 AND 29 DEGREES OF FREEDOM
Note that although the residual standard deviation is significantly lower than it was for theoriginal fit, we cannot compare them directly since the fits were performed on different scales