Chapter 2 One-Way Analysis of Covariance — One Covariate in a Completely Randomized Design Structure2.1 The Model 2.2 Estimation2.3 Strategy for Determining the Form of the Model2.4 Comp
Trang 1CHAPMAN & HALL/CRC
A CRC Pr ess Compan yBoca Raton London Ne w York Washington, D.C
Trang 2This book contains information obtained from authentic and highly regarded sources Reprinted material
is quoted with permission, and sources are indicated A wide variety of references are listed Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use.
Apart from any fair dealing for the purpose of research or private study, or criticism or review, as permitted under the UK Copyright Designs and Patents Act, 1988, this publication may not be reproduced, stored
or transmitted, in any form or by any means, electronic or mechanical, including photocopying, filming, and recording, or by any information storage or retrieval system, without the prior permission
micro-in writmicro-ing of the publishers, or micro-in the case of reprographic reproduction only micro-in accordance with the terms of the licenses issued by the Copyright Licensing Agency in the UK, or in accordance with the terms of the license issued by the appropriate Reproduction Rights Organization outside the UK The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale Specific permission must be obtained in writing from CRC Press LLC for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W Corporate Blvd., Boca Raton, Florida 33431
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com
© 2002 by Chapman & Hall/CRC
No claim to original U.S Government works International Standard Book Number 1-584-88083-X Library of Congress Card Number 84-000839 Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
Printed on acid-free paper
Library of Congress Cataloging-in-Publication Data
Milliken, George A., 1943–
Analysis of messy data / George A Milliken, Dallas E Johnson.
2 v : ill ; 24 cm.
Includes bibliographies and indexes.
Contents: v 1 Designed experiments v 2 Nonreplicated experiments.
Vol 2 has imprint: New York : Van Nostrand Reinhold.
Trang 3Chapter 2 One-Way Analysis of Covariance — One Covariate in a
Completely Randomized Design Structure2.1 The Model
2.2 Estimation2.3 Strategy for Determining the Form of the Model2.4 Comparing the Treatments or Regression Lines2.4.1 Equal Slopes Model
2.4.2 Unequal Slopes Model-Covariate by Treatment Interaction2.5 Confidence Bands about the Difference of Two Treatments2.6 Summary of Strategies
2.7 Analysis of Covariance Computations via the SAS®System2.7.1 Using PROC GLM and PROC MIXED
2.7.2 Using JMP®
2.8 ConclusionsReferencesExercise
Chapter 3 Examples: One-Way Analysis of Covariance — One Covariate
in a Completely Randomized Design Structure 3.1 Introduction
3.2 Chocolate Candy — Equal Slopes3.2.1 Analysis Using PROC GLM3.2.2 Analysis Using PROC MIXED3.2.3 Analysis Using JMP®
3.3 Exercise Programs and Initial Resting Heart Rate — Unequal Slopes3.4 Effect of Diet on Cholesterol Level: An Exception to the Basic Analysis of Covariance Strategy
3.5 Change from Base Line Analysis Using Effect of Diet on Cholesterol Level Data
3.6 Shoe Tread Design Data for Exception to the Basic Strategy
Trang 43.7 Equal Slopes within Groups of Treatments and Unequal Slopes between Groups
3.8 Unequal Slopes and Equal Intercepts — Part 1
3.9 Unequal Slopes and Equal Intercepts — Part 2
References
Exercises
Chapter 4 Multiple Covariates in a One-Way Treatment Structure in a
Completely Randomized Design Structure4.1 Introduction
4.2 The Model
4.3 Estimation
4.4 Example: Driving A Golf Ball with Different Shafts
4.5 Example: Effect of Herbicides on the Yield of Soybeans —Three Covariates
4.6 Example: ModelsThat Are Quadratic Functions of the Covariate4.7 Example: Comparing Response Surface Models
Reference
Exercises
Chapter 5 Two-Way Treatment Structure and Analysis of Covariance in
a Completely Randomized Design Structure5.1 Introduction
5.2 The Model
5.3 Using the SAS®System
5.3.1 Using PROC GLM and PROC MIXED5.3.2 Using JMP®
5.4 Example: Average Daily Gains and Birth Weight — Common Slope 5.5 Example: Energy from Wood of Different Types of Trees — Some Unequal Slopes
5.6 Missing Treatment Combinations
5.7 Example: Two-Way Treatment Structure with Missing Cells
6.2 The Beta-Hat Model and Analysis
6.3 Testing Equality of Parameters
6.4 Complex Treatment Structures
6.5 Example: One-Way Treatment Structure
6.6 Example: Two-Way Treatment Structure
Trang 56.7 Summary
Exercises
Chapter 7 Variable Selection in the Analysis of Covariance Model7.1 Introduction
7.2 Procedure for Equal Slopes
7.3 Example: One-Way Treatment Structure with Equal Slopes Model7.4 Some Theory
7.5 When Slopes are Possibly Unequal
References
Exercises
Chapter 9 Two Treatments in a Randomized Complete Block Design
Structure9.1 Introduction
9.2 Complete Block Designs
9.3 Within Block Analysis
9.4 Between Block Analysis
9.5 Combining Within Block and Between Block Information
9.6 Determining the Form of the Model
9.7 Common Slope Model
9.8 Comparing the Treatments
9.8.1 Equal Slopes Models
9.8.2 Unequal Slopes Model
9.9 Confidence Intervals about Differences of Two Regression Lines9.9.1 Within Block Analysis
9.9.2 Combined Within Block and Between Block Analysis9.10 Computations for Model 9.1 Using the SAS®System
9.11 Example: Effect of Drugs on Heart Rate
9.12 Summary
References
Exercises
Trang 6Chapter 10 More Than Two Treatments in a Blocked Design Structure10.1 Introduction
10.2 RCB Design Structure — Within and Between Block Information10.3 Incomplete Block Design Structure — Within and Between Block Information
10.4 Combining Between Block and Within Block Information
10.5 Example: Five Treatments in RCB Design Structure
10.6 Example: Balanced Incomplete Block Design Structure with Four Treatments
10.7 Example: Balanced Incomplete Block Design Structure with Four Treatments Using JMP®
10.8 Summary
References
Exercises
Chapter 11 Covariate Measured on the Block in RCB and Incomplete
Block Design Structures11.1 Introduction
11.2 The Within Block Model
11.3 The Between Block Model
11.4 Combining Within Block and Between Block Information
11.5 Common Slope Model
11.6 Adjusted Means and Comparing Treatments
11.6.1 Common Slope Model
11.6.2 Non-Parallel Lines Model
11.7 Example: Two Treatments
11.8 Example: Four Treatments in RCB
11.9 Example: Four Treatments in BIB
12.3 Estimation of the Variance Components
12.4 Changing Location of the Covariate Changes the Estimates of the Variance Components
12.5 Example: Balanced One-Way Treatment Structure
12.6 Example: Unbalanced One-Way Treatment Structure
12.7 Example: Two-Way Treatment Structure
12.8 Summary
References
Exercises
Trang 7Chapter 13 Mixed Models
13.1 Introduction
13.2 The Matrix Form of the Mixed Model
13.3 Fixed Effects Treatment Structure
13.4 Estimation of Fixed Effects and Some Small Sample Size
Approximations
13.5 Fixed Treatments and Locations Random
13.6 Example: Two-Way Mixed Effects Treatment Structure in a CRD13.7 Example: Treatments are Fixed and Locations are Random with a RCB at Each Location
References
Exercises
Chapter 14 Analysis of Covariance Models with Heterogeneous Errors14.1 Introduction
14.2 The Unequal Variance Model
14.3 Tests for Homogeneity of Variances
14.3.1 Levene’s Test for Equal Variances
14.3.2 Hartley’s F-Max Test for Equal Variances
14.3.3 Bartlett’s Test for Equal Variances
14.3.4 Likelihood Ratio Test for Equal Variances
14.4 Estimating the Parameters of the Regression Model
14.4.1 Least Squares Estimation
14.4.2 Maximum Likelihood Methods
14.5 Determining the Form of the Model
14.6 Comparing the Models
14.6.1 Comparing the Nonparallel Lines Models
14.6.2 Comparing the Parallel Lines Models
14.7 Computational Issues
14.8 Example: One-Way Treatment Structure with Unequal Variances14.9 Example: Two-Way Treatment Structure with Unequal Variances14.10 Example: Treatments in Multi-location Trial
Trang 815.5 Covariate is Measured on the Large Size of Experimental Unit and a Covariate is Measured on the Small Size of Experimental Unit15.6 General Representation of the Covariate Part of the Model
15.6.1 Covariate Measured on Large Size of Experimental Unit 15.6.2 Covariate Measured on the Small Size of Experimental Units15.6.3 Summary of General Representation
15.7 Example: Flour Milling Experiment — Covariate Measured on the Whole Plot
15.8 Example: Cookie Baking
15.9 Example: Teaching Methods with One Covariate Measured on the Large Size Experimental Unit and One Covariate Measured on the Small Size Experimental Unit
15.10 Example: Comfort Study in a Strip-Plot Design with Three Sizes of Experimental Units and Three Covariates
16.2 The Covariance Part of the Model — Selecting R
16.3 Covariance Structure of the Data
16.4 Specifying the Random and Repeated Statements for PROC MIXED
of the SAS®System
16.5 Selecting an Adequate Covariance Structure
16.6 Example: Systolic Blood Pressure Study with Covariate Measured
on the Large Size Experimental Unit
16.7 Example: Oxide Layer Development Experiment with Three Sizes
of Experimental Units Where the Repeated Measure is at the Middle Size of Experimental Unit and the Covariate is Measured on the Small Size Experimental Unit
17.2 Experiments with A Single Covariate
17.3 Experiments with Multiple Covariates
17.4 Selecting Non-null and Null Partitions
17.5 Estimating the Parameters
17.6 Example: Milling Flour Using Three Factors Each at Two Levels17.7 Example: Baking Bread Using Four Factors Each at Two Levels17.8 Example: Hamburger Patties with Four Factors Each at Two Levels
Trang 917.9 Example: Strength of Composite Material Coupons with Two Covariates
17.10 Example: Effectiveness of Paint on Bricks with Unequal Slopes17.11 Summary
References
Exercises
Chapter 18 Special Applications of Analysis of Covariance
18.1 Introduction
18.2 Blocking and Analysis of Covariance
18.3 Treatments Have Different Ranges of the Covariate
18.4 Nonparametric Analysis of Covariance
18.4.1 Heart Rate Data from Exercise Programs
18.4.2 Average Daily Gain Data from a Two-Way Treatment Structure
18.5 Crossover Design with Covariates
18.6 Nonlinear Analysis of Covariance
18.7 Effect of Outliers
References
Exercises
Trang 10Analysis of covariance is a statistical procedure that enables one to incorporateinformation about concomitant variables into the analysis of a response variable.Sometimes this is done in an attempt to reduce experimental error Other times it isdone to better understand the phenomenon being studied The approach used in thisbook is that the analysis of covariance model is described as a method of comparing
a series of regression models — one for each of the levels of a factor or combinations
of levels of factors being studied Since covariance models are regression models,analysts can use all of the methods of regression analysis to deal with problemssuch as lack of fit, outliers, etc The strategies described in this book will enable thereader to appropriately formulate and analyze various kinds of covariance models.When covariates are measured and incorporated into the analysis of a responsevariable, the main objective of analysis of covariance is to compare treatments ortreatment combinations at common values of the covariates This is particularly truewhen the experimental units assigned to each of the treatment combinations mayhave differing values of the covariates Comparing treatments is dependent on theform of the covariance model and thus care must be taken so that mistakes are notmade when drawing conclusions
The goal of this book is to present the structure and philosophy for using theanalysis of covariance by including descriptions of methodologies, illustrating themethodologies by analyzing numerous data sets, and occasionally furnishing sometheory when required Our aim is to provide data analysts with tools for analyzingdata with covariates and to enable them to appropriately interpret the results.Some of the methods and techniques described in this book are not available inother books, but two issues of Biometrics (1957, Volume 13, Number 3, and 1982,Volume 38, Number 3) were dedicated to the topic of analysis of covariance Thetopics presented are among those that we, as consulting statisticians, have found to
be most helpful in analyzing data when covariates are available for possible inclusion
in the analysis
Readers of this book will learn how to:
• Formulate appropriate analysis of covariance models
• Simplify analysis of covariance models
• Compare levels of a factor or of levels of combinations of factors whenthe model involves covariates
• Construct and analyze a model with two or more factors in the treatmentstructure
• Analyze two-way treatment structures with missing cells
• Compare models using the beta-hat model
• Perform variable selection within the analysis of covariance model
Trang 11• Analyze models with blocking in the design structure and use combinedintra-block and inter-block information about the slopes of the regressionmodels
• Use random statements in PROC MIXED to specify random coefficientregression models
• Carry out the analysis of covariance in a mixed model framework
• Incorporate unequal treatment variances into the analysis
• Specify the analysis of covariance models for split-plot, strip-plot andrepeated measures designs both in terms of the regression models and thecovariance structures of the repeated measures
• Incorporate covariates into the analysis of nonreplicated experiments, thusextending some of the results in Analysis of Messy Data, Volume II
The last chapter consists of a collection of examples that deal with (1) using thecovariate to form blocks, (2) crossover designs, (3) nonparametric analysis of cova-riance, (4) using a nonlinear model for the covariate model, and (5) the process ofexamining mixed analysis of covariance models for possible outliers
The approach used in this book is similar to that used in the first two volumes.Each topic is covered from a practical viewpoint, emphasizing the implementation
of the methods much more than the theory behind the methods Some theory hasbeen presented for some of the newer methodologies The book utilized the proce-dures of the SAS® system and JMP® software packages to carry out the computationsand few computing formulae are presented Either SAS® system code or JMP® menusare presented for the analysis of the data sets in the examples The data in theexamples (except for those using chocolate chips) were generated to simulate realworld applications that we have encountered in our consulting experiences.This book is intended for everyone who analyzes data The reader should have
a knowledge of analysis of variance and regression analysis as well as basic statisticalideas including randomization, confidence intervals, and hypothesis testing The firstfour chapters contain the information needed to form a basic philosophy for usingthe analysis of covariance with a one-way treatment structure and should be read
by everyone As one progresses through the book, the topics become more complex
by going from designs with blocking to split-plot and repeated measures designs.Before reading about a particular topic in the later chapters, read the first fourchapters Knowledge of Chapters 13 and 14 from Analysis of Messy Data, Volume I: Designed Experiments would be useful for understanding the part of Chapter 5involving missing cells The information in Chapters 4 through 9 of Analysis of Messy Data, Volume II: Nonreplicated Experiments is useful for comprehending thetopics discussed in Chapter 17
This book is the culmination of more than 25 years of writing The earliereditions of this manuscript were slanted toward providing an appropriate analysis
of split-plot type designs by using fixed effects software such as PROC GLM of theSAS® system With the development of mixed models software, such as PROCMIXED of the SAS® system and JMP®, the complications of the analysis of split-plot type designs disappeared and thus enabled the manuscript to be completedwithout including the difficult computations that are required when using fixed
Trang 12effects software Over the years, several colleagues made important contributions.Discussions with Shie-Shien Yang were invaluable for the development of the vari-able selection process described in Chapter 7 Vicki Landcaster and Marie Loughinread some of the earlier versions and provided important feedback Discussions withJames Schwenke, Kate Ash, Brian Fergen, Kevin Chartier, Veronica Taylor, andMike Butine were important for improving the chapters involving combining intra-and inter-block information and the strategy for the analysis of repeated measuresdesigns Finally, we cannot express enough our thanks to Jane Cox who typed many
of the initial versions of the chapters If it were not for Jane’s skills with the wordprocessor, the task of finishing this book would have been much more difficult
We dedicate this volume to all who have made important contributions to ourpersonal and professional lives This includes our wives, Janet and Erma Jean, ourchildren, Scott and April and Kelly and Mark, and our parents and parents in-lawwho made it possible for us to pursue our careers as statisticians We were bothfortunate to study with Franklin Graybill and we thank him for making sure that wewere headed in the right direction when our careers began
Trang 13Introduction to the Analysis of Covariance
1.1 INTRODUCTION
The statistical procedure termed analysis of covariance has been used in severalcontexts The most common description of analysis of covariance is to adjust theanalysis for variables that could not be controlled by the experimenter For example,
if a researcher wishes to compare the effect that ten different chemical weed controltreatments have on yield of a specific wheat variety, the researcher may wish tocontrol for the differential effects of a fertility trend occurring in the field and forthe number of wheat plants per plot that happen to emerge after planting Thedifferential effects of a fertility trend can possibly be removed by using a randomizedcomplete block design structure, but it may not be possible to control the number
of wheat plants per plot (unless the seeds are sewn thickly and then the emergingplants are thinned to a given number of plants per plot) The researcher wishes tocompare the treatments as if each treatment were grown on plots with the sameaverage fertility level and as if every plot had the same number of wheat plants Theuse of a randomized complete block design structure in which the blocks are con-structed such that the fertility levels of plots within a block are very similar willenable the treatments to be compared by averaging over the fertility levels, but theanalysis of covariance is a procedure which can compare treatment means after firstadjusting for the differential number of wheat plants per plot The adjustmentprocedure involves constructing a model that describes the relationship betweenyield and the number of wheat plants per plot for each treatment, which is in theform of a regression model The regression models, one for each level of thetreatment, are then compared at a predetermined common number of wheat plantsper plot
1.2 THE COVARIATE ADJUSTMENT PROCESS
To demonstrate the type of adjustment process that is being carried out when theanalysis of covariance methodology is applied, the set of data in Table 1.1 is used
in which there are two treatments and five plots per treatment in a completelyrandomized design structure Treatment 1 is a chemical application to control thegrowth of weeds and Treatment 2 is a control without any chemicals to control theweeds The data in Table 1.1 consist of the yield of wheat plants of a specific varietyfrom plots of identical size along with the number of wheat plants that emerged1
Trang 142 Analysis of Messy Data, Volume III: Analysis of Covariance
after planting per plot The researcher wants to compare the yields of the twotreatments for the condition when there are 125 plants per plot
Figure 1.1 is a graphical display of the plot yields for each of the treatmentswhere the circles represent the data points for Treatment 1 and the boxes representthe data points for Treatment 2 An “X” is used to mark the means of each of thetreatments
If the researcher uses the two-sample t-test or one-way analysis of variance tocompare the two treatments without taking information into account about thenumber of plants per plot, a t statistic of 1.02 or a F statistic of 1.05 is obtained,indicating the two treatment means are not significantly different (p = 0.3361) Theresults of the analysis are in Table 1.2 in which the estimated standard error of thedifference of the two treatment means is 67.23
TABLE 1.1 Yield and Plants per Plot Data for the Example
in Section 1.2
Treatment 1 Treatment 2 Yield per plot Plants per plot Yield per plot Plants per plot
Trang 15Introduction to the Analysis of Covariance 3
The next step is to investigate the relationship between the yield per plot andthe number of plants per plot Figure 1.2 is a display of the data where the number
of plants is on the horizontal axis and the yield is on the vertical axis The circlesdenote the data for Treatment 1 and the boxes denote the data for Treatment 2 Thetwo lines on the graph, denoted by Treatment 1 model and Treatment 2 model, werecomputed from the data by fitting the model yij = αi + βxij + εij, i = 1, 2 and j = 1,
TABLE 1.2 Analysis of Variance Table and Means for Comparing the Yields of the Two Treatments Where No Information about the Number of Plants per Plot is Used
Source df SS MS FValue ProbF
Model 1 11833.60 11833.60 1.05 0.3361 Error 8 90408.40 11301.05
Number of plants per plot
Treatment 1 data
Treatment 1 model
Treatment 2 model
Treatment 2 data
Trang 164 Analysis of Messy Data, Volume III: Analysis of Covariance
2, …, 5, a model with different intercepts and common or equal slopes The resultsare included in Table 1.3
Now analysis of covariance is used to compare the two treatments when thereare 125 plants per plot The process of the analysis of covariance is to slide or movethe observations from a given treatment along the estimated regression model (par-allel to the model) to intersect the vertical line at 125 plants per plot This sliding
is demonstrated in Figure 1.3 where the solid circles represent the adjusted data forTreatment 1 and the solid boxes represent the adjusted data for Treatment 2.The lines join the open circles to the solid circles and join the open boxes tothe solid boxes The lines indicate that the respective data points slid to the verticalline at which there are 125 plants per plot
The adjusted data are computed by
The terms y ij – ( ˆαi + ˆβx ij) i = 1,2 and j = 1,2,…,5 are the residuals or deviations ofthe observations from the estimated regression models The preliminary computa-tions of the adjusted yields are in Table 1.4 These adjusted yields are the predictedyields of the plots as if each plot had 125 plants
The next step is to compare the two treatments through the adjusted yield values
by computing a two-sample t statistic or the F statistic from a one-way analysis ofvariance The results of these analyses are in Table 1.5
A problem with this analysis is that it assumes the adjusted data are not adjusteddata and so there is no reduction in the degrees of freedom for error due to estimatingthe slope of the regression lines Hence the final step is to recalculate the statistics
Trang 17Introduction to the Analysis of Covariance 5
by changing the degrees of freedom for error in Table 1.5 from 8 to 7 (the cost ofestimating the slope) The sum of squares error is identical for both Tables 1.3 and1.5, but the error sum of squares from Table 1.5 is based on 8 degrees of freedominstead of 7 To account for this change in degrees of freedom in Table 1.5, theestimated standard error for comparing the two treatments needs to be multiplied
by , the t statistic needs to be multiplied by , and the F statistic needs to
be multiplied by 7/8 The recalculated statistics are presented in Table 1.6 Here theestimated standard error of the difference between the two means is 18.11, a 3.7-foldreduction over the analysis that ignores the information from the covariate Thus,
FIGURE 1.3 Plot of the data and estimated regression models showing how to compute adjusted yield values at 125 plants per plot.
TABLE 1.4
Preliminary Computations Used in Computing Adjusted Data
for Each Treatment as If All Plots Had 125 Plants per Plot
Treatment Yield Per Plot Plants Per Plot Residual Adjusted Yield
Number of plants per plot
Treatment 1 data Adjusted data symbols
Treatment 1 model Treatment 2 model
Treatment 2 data
125 plants per plot
Slide observations parallel to regression line
to meet the line of 125 plants per plot
Trang 186 Analysis of Messy Data, Volume III: Analysis of Covariance
by taking into account the linear relationship between the yield of the plot and thenumber of plants in that plot, there is a tremendous reduction in the variability ofthe data In fact, the analysis of the adjusted data shows there is a significantdifference between the yields of the two treatments when adjusting for the unequalnumber of plants per plot (p = 0.0428), when the analysis of variance in Table 1.2did not indicate there is a significant difference between the treatments (p = 0.3361).The final issue is that since this analysis of the adjusted data overlooks the fact theslope has been estimated, the estimated standard error of the difference of two means
is a little small as compared to the estimated standard error one gets from the analysis
of covariance The estimated standard error of the difference of the two means ascomputed from the analysis of covariance in Table 1.3 is 18.26 as compared to 18.11for the analysis of the adjusted data Thus the two analyses are not quite identical.This example shows the power of being able to use information about covariates
or independent variables to make decisions about the treatments being included in
TABLE 1.5
Analysis of the Adjusted Yields (Too Many Degrees
of Freedom for Error)
Source df SS MS FValue ProbF
Model 1 5002.83 5002.83 6.98 0.0297 Error 8 5737.26 717.16
Recalculated estimated standard error 18.11 Recalculated t-statistic 2.47 Recalculated F-statistic 6.10 Recalculated significance level 0.0428
Trang 19Introduction to the Analysis of Covariance 7
the study The analysis of covariance uses a model to adjust the data as if all theobservations are from experimental units with identical values of the covariates
A typical discussion of analysis of covariance indicates that the analyst shouldinclude the number of plants as a term in the model so that term accounts forvariability in the observed yields, i.e., the variance of the model is reduced Ifincluding the number of plants in the model reduces the variability enough, then it
is used to adjust the data before the variety means are compared It is important toremember that there is a model being assumed when the covariate or covariates areincluded in a model
1.3 A GENERAL AOC MODEL AND
THE BASIC PHILOSOPHY
In this text, the analysis of covariance is described in more generality than that ofadjusting for variation due to uncontrollable variables The analysis of covariance
is defined as a method for comparing several regression surfaces or lines, one foreach treatment or treatment combination, where a different regression surface ispossibly used to describe the data for each treatment or treatment combination
A one-way treatment structure with t treatments in a completely randomizeddesign structure (Milliken and Johnson, 1992) is used as a basis for setting up thedefinitions for the analysis of covariance model The experimental situation involvesselecting N experimental units from a population of experimental units and measur-ing k characteristics x1ij, x2ij, …, xkij on each experimental unit The variables x1ij,
x2ij, …, xkij are called covariates or independent variables or concomitant variables
It is important to measure the values of the covariates before the treatments areapplied to the experimental units so that the levels of the treatments do not effectthe values of the covariates At a minimum, the values of the covariate should not
be effected by the applied levels of the treatments In the chemical weed treatmentexperiment, the number of plants per plot occur after applying a particular treatment
on a plot, so the value of the covariate (number of plants per plot) could not bedetermined before the treatments were applied to the plots If the germination rate isaffected by the applied treatments, then the number of plants per plot cannot be used
as a covariate in the conventional manner (see Chapter 2 for further discussion) Afterthe set of experimental units is selected and the values of the covariates are determined(when possible), then randomly assign ni experimental units to treatment i, where
N = Σi1t
n i One generally assigns equal numbers of experimental units to the levels
of the treatment, but equal numbers of experimental units per level of the treatmentare not necessary After an experimental unit is subjected its specific level of thetreatment, then measure the response or dependent variable which is denoted by yij.Thus the variables used in the discussions are summarized as:
yij is the dependent measure
x1ij is the first independent variable or covariate
x2ij is the second independent variable or covariate
xkij is the kth independent variable or covariate
Trang 208 Analysis of Messy Data, Volume III: Analysis of Covariance
At this point, the experimental design is a one-way treatment structure with
t treatments in a completely randomized design structure with k covariates If there
is a linear relationship between the mean of y for the ith treatment and the k covariates
or independent variables, an analysis of covariance model can be expressed as:
(1.1)
for i = 1, 2, …, t, and j = 1, 2, …, ni, and the εij ~ iid N(0, σ2), i.e., the εij areindependently identically distributed normal random variables with mean 0 andvariance σ2 The important thing to note about this model is that the mean of the
y values from a given treatment depends on the values of the x’s as well as on thetreatment applied to the experimental units
The analysis of covariance is a strategy for making decisions about the form ofthe covariance model through testing a series of hypotheses and then making treat-ment comparisons by comparing the estimated responses from the final regressionmodels Two important hypotheses that help simplify the regression models are
H01: βh1 = βh2 = … = βht = 0 vs Ha1: (not H01:), that is, all the treatments’
slopes for the hth covariate are zero, h = 1, 2, …, k, or
H02: βh1 = βh2 = … = βht vs Ha2: (not Ho2:), that is, the slopes for the hth
covariate are equal across the treatments, meaning the surfaces are parallel
in the direction of the hth covariate, h = 1, 2, …, k
The analysis of covariance model in Equation 1.1 is a combination of an analysis
of variance model and a regression model The analysis of covariance model is part
of an analysis of variance model since the intercepts and slopes are functions of thelevels of the treatments The analysis of covariance model is also part of a regressionmodel since the model for each treatment is a regression model
An experiment is designed to purchase a certain number of degrees of freedomfor error (generally without the covariates) and the experimenter is willing to sellsome of those degrees of freedom for good or effective covariates which will helpreduce the magnitude of the error variance The philosophy in this book is to selectthe simplest possible expression for the covariate part of the model before makingtreatment comparisons
This process of model building to determine the simplest adequate form of theregression models follows the principle of parsimony and helps guard against fool-ishly selling degrees of freedom for error to retain unnecessary covariate terms inthe model Thus the strategy for analysis of covariance begins with testing hypothesessuch as H01 and H02 to make decisions about the form of the covariate or regressionpart of the model Once the form of the covariate part of the model is finalized, thetreatments are compared by comparing the regression surfaces at predeterminedvalues of the covariates
The structure of the following chapters leads one through the forest of analysis
of covariance by starting with the simple model with one covariate and buildingthrough the complex process involving analysis of covariance in split-plot and
yij=βoi+βlixlij+β2ix2ij+…+βkixkij+εij
Trang 21Introduction to the Analysis of Covariance 9
repeated measures designs Other topics discussed are multiple covariates,
experi-ments involving blocks, and graphical methods for comparing the models for the
various treatments
Chapter 2 discusses the simple analysis of covariance model involving a
one-way treatment structure in a completely randomized design structure with one
covariate and Chapter 3 contains several examples demonstrating the strategies for
situations involving one covariate Chapter 4 presents a discussion of the analysis
of covariance models involving more than one covariate which includes polynomial
regression models Models involving two-way treatment structures, both balanced
and unbalanced, are discussed in Chapter 5 A method of comparing parameters via
beta-hat models is described in Chapter 6 Chapter 7 describes a method for variable
selection in the analysis of covariance where many possible covariates were
mea-sured Chapter 8 discusses methods for testing the equality of several regression
models
The next set of chapters (9 through 11) discuss analysis of covariance in the
randomized complete block and incomplete block design structures The analysis
of data where the values of a characteristic are used to construct blocks is described,
i.e., where the value of the covariate is the same for all experimental units in a block
In the analysis of covariance context, inter- or between block information about the
intercepts and slopes is required to extract all available information about the
regres-sion lines or surfaces from the data Usual analysis methods extract only the
intra-block information from the data A mixed models analysis involving methods of
moments and maximum likelihood estimation of the variance components provides
combined estimates of the parameters and should be used for blocked experiments
Chapter 12 describes models where the levels of the treatments are random effects
(Littell et al., 1996) The models in Chapter 12 include random coefficient models
Chapter 13 provides a discussion of mixed models with covariates and Chapter 14
presents a discussion of unequal variance models
Chapters 15 and 16 discuss problems with applying the analysis of covariance
to experiments involving repeated measures and split-plot design structures One
has to consider the size of experimental unit on which the covariate is measured
Cases are discussed where the covariate is measured on the large size of an
exper-imental unit and when the covariate is measured on the small size of an experexper-imental
unit Several examples of split-plot and repeated measures designs are presented A
process of selecting the simplest covariance structure for the repeated measures part
of the model and the simplest covariate (regression model) part of the model is
described The analysis of covariance in the nonreplicated experiment is discussed
in Chapter 17 The half-normal plot methodology (Milliken and Johnson, 1989) is
used to determine the form of the covariate part of the model and to determine which
effects are to be included in the intercept part of the model
Finally, several special applications of analysis of covariance are presented in
Chapter 18, including using the covariate to construct blocks, crossover designs,
nonlinear models, nonparameteric analysis of covariance, and a process for
exam-ining mixed models for possible outliers in the data set
The procedures of the SAS® system (1989, 1996, and 1997) and JMP® (2000)
are used to demonstrate how to use software to carry out the analysis of covariance
Trang 2210 Analysis of Messy Data, Volume III: Analysis of Covariance
computations The topic of analysis of covariance has been the topic of two volumes
of Biometrics, Volume 13, Number 3, in 1957 and Volume 38, Number 3, in 1982
The collection of papers in these two volumes present discussions of widely diverse
applications of analysis of covariance
REFERENCES
Littell, R C., Milliken, G A., Stroup, W W., and Wolfinger, R D (1996) SAS® System for
Mixed Models, SAS Institute Inc., Cary, NC
Milliken, G A and Johnson, D E (1989) Analysis of Messy Data, Volume II: Nonreplicated
Experiments, Chapman & Hall, London
Milliken, G A and Johnson, D E (1992) Analysis of Messy Data, Volume I: Design
Experiments, Chapman & Hall, London
SAS Institute Inc (1989) SAS/STAT® User’s Guide, Version 6, Fourth Edition, Volume 2,
Trang 23One-Way Analysis
of Covariance — One Covariate in a Completely Randomized Design Structure
In any case, it is a good strategy to use the analysis of variance to check to see ifthere are differences among the treatment covariate means (see Chapter 18).Assume that the mean of yij can be expressed as a linear function of the covariate,
xij, with possibly a different linear function being required for each treatment It isimportant to note that the mean of an observation from the ith treatment group depends
on the value of the covariate as well as the treatment In analysis of variance, themean of an observation from the ith treatment group depends only on the treatment.The analysis of covariance model for a one-way treatment structure with onecovariate in a completely randomized design structure is
(2.1)
where the mean of yi for a given value of X is µY i X = αi + βiX For making inferences,
it is assumed that εij ~ iid N(0, σ2) Model 2.1 has t intercepts (α1, …, αt), t slopes(β1, …, βt), and one variance σ2, i.e., the model represents a collection of simplelinear regression models with a different model for each level of the treatment
Trang 2412 Analysis of Messy Data, Volume III: Analysis of Covariance
Before analyzing this model, make sure that the data from each treatment can
in fact be described by a simple linear regression model Various regression nostics should be run on the data before continuing The equal variance assumptionshould also be checked (see Chapter 14) If the simple linear regression model isnot adequate to describe the data for each treatment, then another model must beselected before continuing with the analysis of covariance
diag-The analysis of covariance is a process of comparing the regression models andthen making decisions about the various parameters of the models The processinvolves comparing the t slopes, comparing the distances between the regression lines(surfaces) at preselected values of X, and possibly comparing the t intercepts Theanalysis of covariance computations are typically presented in summation notationwith little emphasis on interpretations In this and the following chapters, the variouscovariance models are expressed in terms of matrices (see Chapter 6 of Milliken andJohnson, 1992) and their interpretations are discussed Software is used as the mode
of doing the analysis of covariance computations The matrix form of Model 2.1 is
(2.2)
which is expressed in the form of a linear model as y = Xβ + ε The vector y denotesthe observations ordered by observation within each treatment, the 2t × 1 vector βdenotes the collection of slopes and intercepts, the matrix X is the design matrix,and the vector ε represents the random errors
2.2 ESTIMATION
The least squares estimator of the parameter vector β is ˆβ = (X′X)–1X′y, but the leastsquares estimator of β can also be obtained by fitting the simple linear regressionmodel to the data from each treatment and computing the least squares estimator ofeach pair of parameters (αi, βi) For data from the ith treatment, fit the model
(2.3)
y
y y
L
LL
L
LM
2
x tn
t
t t
α
βαβαβε
i i
β
ε ,
Trang 25One-Way Analysis of Covariance 13
which is expressed as yi = Xiβi +εi The least squares estimator of βi is ˆβi =
(X′iXi)–1X′iy, the same as the estimator obtained for a simple linear regression model
The estimates of βi and αi in summation notation are
and
The residual sum of squares for the ith model is
There are ni – 2 degrees of freedom associated with SSResi since the ith model
involves two parameters After testing the equality of the treatment variances (see
Chapter 14) and deciding there is not enough evidence to conclude the variances
are unequal, the residual sum of squares for Model 2.1 can be obtained by pooling
residual sums of the squares for each of the t models, i.e., sum the SSResi together
to obtain
(2.4)
The pooled residual sum of squares, SSRes, is based on the pooled degrees of
freedom, computed and denoted by
The best estimate of the variance of the experimental units is ˆσ2 = SSRes/(N – 2t)
The sampling distribution of (N – 2t) ˆσ2/σ2 is central chi-square with (N – 2t) degrees
of freedom The sampling distribution of the least squares estimator, ˆβ′ = ( ˆα1, ˆβ1,
…, ˆαt, ˆβt) is normal with mean β′ = (α1, β1, …, αt, βt) and variance-covariance
matrix σ2 (X′X)–1, which can be written as
n
ij i i j
n
x y n x y
x n x
i i
1
00
Trang 2614 Analysis of Messy Data, Volume III: Analysis of Covariance
where
2.3 STRATEGY FOR DETERMINING THE FORM
OF THE MODEL
The main objective of an analysis of covariance is to compare the t regression lines
at several predetermined fixed values of the covariate, X Depending on the values
of the slopes, βi, there are various strategies one can use to compare the regression
lines
The first question that needs to be answered is, does the mean of y given X
depend on the value of X? If the data have been plotted for each of the treatment
groups, and there seems to be a linear relationship between the values of y and x,
then the question can be subjectively answered That question can be answered
statistically by testing the hypothesis
This hypothesis is equivalent to testing the hypothesis that the slopes are all zero, i.e.,
(2.6)
The null hypothesis states that none of the treatments’ means depend linearly on
the value of the covariate, X The notation E(yijX = x) denotes the mean of the
distribution of y for a given value of X, X = x
The principle of conditional error (Milliken and Johnson, 1992) or model
com-parison method (Draper and Smith, 1981) provides an excellent way of obtaining
the desired test statistic The model restricted by the conditions of the null hypothesis,
H01, is
(2.7)
Model 2.7 is the usual analysis of variance model for the one-way treatment structure
in a completely randomized design structure The residual sum of squares for
i
t i
1 1
=
∑
Trang 27which is based on d.f.SSRes(H01) = N – t degrees of freedom (where the mean of themodel under H01 has t parameters, the intercepts) Using the principle of conditionalerror, the sum of squares due to deviations from H01, denoted by SSH01, is computed as,
(2.9)
which is based on d.f.SSRes(H01) – d.f.SSRes = (N – t) – (N – 2t) = t degrees of freedom.The degrees of freedom associated with SSH01 is equal to t since the hypothesisbeing tested is that the t slope parameters are all equal to zero, i.e., the values of
t parameters are specified; thus there are t degrees of freedom associated with thesum of squares The sampling distribution of SSH01/σ2 is a noncentral chi-squaredistribution with t degrees of freedom where the noncentrality parameter is zero ifand only if H01 is true, i.e., all slopes are equal to zero A statistic for testing H01
vs Ha1 is
(2.10)
and, when H01 is true, the sampling distribution of is that of a central FH01 distributionwith t and N – 2t degrees of freedom
If you fail to reject H01, then you can conclude that the means of the treatments
do not depend linearly on the value of the covariate, X In this case, the next step
in the analysis is to use analysis of variance to make comparisons among thetreatments’ means, i.e., compare the αi, i = 1, 2, …, t (as is discussed in Chapter 1,Milliken and Johnson, 1992) Recall you have already determined that the simplelinear regression model adequately describes the data Thus if the slopes are zero,then you conclude the models are of the form
If H01 is rejected, then you conclude that the mean of y does depend linearly onthe value of the covariate X for at least one of the treatments In this case, the nextstep in the analysis of covariance is to determine whether or not the means of thetreatments depend on the covariate X differently (as represented by unequal slopeswhich provide nonparallel lines) A test for homogeneity or equality of the slopesanswers that question The appropriate null hypothesis stating the slopes are equal is
Trang 28which is based on d.f.SSRes(H02) – d.f.SSRes = t – 1 degrees of freedom There are t – 1degrees of freedom associated with SSH02 since t parameters (testing equality) arebeing compared and there are t – 1 linearly independent comparisons of the t slopeswhose values are specified to be zero The sampling distribution of SSH02/σ2 isnoncentral chi-square with t – 1 degrees of freedom where the noncentrality param-eter is zero if and only if H02 is true The statistic used to test H02 is
(2.16)
yij=α βi+ xij+ε ij i=1 2, ,…, , t j=1 2, ,…,ni
y
yy
yyy
x
xxxxx
n
n t
2 1
11
1 21
α
αβε
i
t i
1 1
1
ˆσ
Trang 29which has a noncentral F sampling distribution with t – 1 and N – 2t degrees offreedom If you fail to reject H02, then conclude that the lines are parallel (equalslopes) and proceed to compare the distances among the parallel regression lines bycomparing their intercepts, αi’s (the topic of Section 2.4) Figure 2.1 displays therelationships among the treatment means as a function of the covariate X when thelines are parallel Since the lines are parallel, i.e., the distance between any two lines
is the same for all values of X, a comparison of the intercepts is a comparison ofthe distances between the lines
If you reject H02, then conclude that at least two of the regression lines haveunequal slopes and hence, the set of lines are not parallel Figure 2.2 displays apossible relationship among the means of treatments as a linear function of thecovariate for the nonparallel lines case When the lines are not parallel, the distancebetween two lines depends on the value of X; thus the nonparallel lines case is calledcovariate by treatment interaction
2.4 COMPARING THE TREATMENTS
OR THE REGRESSION LINES
An appropriate method for comparing the distances among the regression linesdepends on the decision you make concerning the slopes of the models If you reject
H01 in Model 2.6 and fail to reject H02 in Model 2.11, the resulting model is a set
of parallel lines (equal slopes as in Figure 2.1) A property of two parallel lines isthat they are the same distance apart for every value of X Thus, the distance betweenany two lines can be measured by comparing the intercepts of the two lines When
FIGURE 2.1 Graph of parallel lines models: common slopes.
Covariate
Trang 30the lines are parallel, contrasts between the intercepts are used to compare thetreatments When the slopes are unequal, there are two types of comparisons thatare of interest, namely, comparing the distances between the various regression lines
at several values of the covariate X and comparing specific parameters, such ascomparing the slopes or comparing the models evaluated at the same or differentvalues of X
At this step in the analysis, you must remember that H02 was not rejected; thus themodel used to describe the mean of y as a function of the covariate is
(2.17)
The residual sum of squares for Model 2.17 is SSRes(H02), the residual sum ofsquares to which you will compare other models, which was given in Equation 2.14.The first hypothesis to be tested is that the distances between the lines evaluated at
a given value of X, say X0, are equal,
FIGURE 2.2 Graph of nonparallel lines models: unequal slopes.
Trang 31where α and β are unspecified An equivalent hypothesis is that the intercepts areequal and is expressed as
(2.20)
where ˆα and ˆβ are least squares estimators of α and β from Model 2.19 SinceModel 2.19 consists of two parameters, SSRes(H03) is based on d.f.SSRes(H03) = N – 2degrees of freedom
Applying the model comparison method, the sum of squares due to deviationfrom H03, given that the slopes are equal (H02 is true), is
(2.21)
which is based on d.f.SSRes(H03) – d.f.SSRes(H02) = t – 1 degrees of freedom The priate test statistic is
appro-(2.22)
The sampling distribution of FH03 is that of a noncentral F distribution with t – 1 and
N – t – 1 degrees of freedom If H03 is not rejected, conclude that all of the datacome from a single simple linear regression model with slope β and intercept α,i.e., there are no treatment differences If H03 is rejected, then conclude that thedistances between one or more pairs of lines are different from zero Since the linesare parallel, the distance between any two lines can be compared at any chosen value
of X If the distance between any two lines is compared at X = 0, it is a comparison
of the difference between two intercepts as αi – αi′ A multiple comparison procedurecan be used to compare the distances between pairs of regression lines An LSD type
of multiple comparison procedure can be used for controlling the comparison-wise
H03: α1=α2= … =αt=α vs Ha3:(not H03)
yij= +α βXij+ε ij
j n
i
t i
1 1
Trang 32Chapter 3 of Milliken and Johnson, 1992) To use a multiple comparison procedure,you must compute the estimated standard error of the difference between two α’s
by constructing the necessary contrasts between the intercepts
For example, if you are interested in the linear combination θ = Σi =1t
ciαi, theestimate is ˆθ = Σi =1t
ciαˆiand the estimated standard error of this estimate is
The statistic to test H0: θ = 0 vs Ha: θ ≠ 0 is tθ ˆ= ˆθ/Sθ ˆwhich is distributed as aStudent t distribution with N – t – 1 degrees of freedom A (1 – α) 100% confidenceinterval about
If you have four equally spaced quantative levels of the treatment, the linear contrast
of the four levels is θ = 3 α1 + 1 α2 – 1 α3 – 3 α4
When analysis of covariance was first developed, it was mainly used to adjustthe mean of y for a selected value of the covariate The value usually selected wasthe mean of the covariate from all t treatments Thus the term adjusted means wasdefined as the mean of y evaluated at X =x , where– x is the mean value of all the–
xij’s The estimators of the means of the treatments evaluated at X =x , called the–adjusted means, are
(2.24)
where ˆαi and ˆβ are least squares estimators of αi and β from Model 2.17 Thecovariance matrix of the adjusted means can be constructed from the elements ofthe covariance matrix of ˆαi, …, ˆαt and ˆβ
t
ˆ
2 1 1
Trang 33The standard errors of the adjusted means are computed as
One hypothesis of interest is that the expected value of the adjusted means areequal This hypothesis can be expressed as
However, since the lines are parallel, the difference between two adjusted means isthe difference between intercepts as µY1 βx = β–x – µY2 βX = β–x = α1 – α2; thus H04 isequivalent to H03
Preplanned treatment comparisons and multiple comparison procedures can becarried out to compare the adjusted means by computing the standard error of thedifference between pairs of adjusted means Since the difference of two such adjustedmeans is ˆαi – ˆαi, the estimated standard error in Equation 2.23 can be used with anyselected multiple comparison procedure Contrasts among adjusted means, whichare also contrasts among the intercepts, measuring linear, quadratic, etc trends,should be used when appropriate
BY T REATMENT I NTERACTION
When you reject H02, then conclude that the nonparallel lines Model 2.1 is necessary
to adequately describe the data The graph of such a possible situation was given inFigure 2.2 Since the lines are not parallel, the distance between any two linesdepends on which value of the covariate is selected (see Figure 2.3) This is calledcovariate by treatment interaction In the nonparallel lines case, a comparison of theintercepts is only a comparison of the regression lines at X = 0 That will generally
be a meaningful comparison only when X = 0 is included in or is close to the range
of X values in the experiment The equal intercept hypothesis given the slopes areunequal is expressed as
H05: E(yijX = x) = α + βix vs Ha5: E(yijX = x) = αi + βix or, equivalently,
H05: α1 = α2 = … = αt = α given βi≠ β′i vs Ha5(not H0)
The model comparison method or principle of conditional error is used to computethe statistic for testing this hypothesis Model 2.1 restricted by the conditions of H05 is
(2.25)and the corresponding residual sum of squares is
Trang 34which is based on d.f.SSRes(H05) = N – t – 1 degrees of freedom The values of ˆα and ˆβi
in Equation 2.26 are the least squares estimators of the parameters of Model 2.25.Using the principle of conditional error, the sum of squares due to deviations from
H05 is computed as
which is based on d.f.SSRes(H05) – d.f. SSRes = t – 1 degrees of freedom The statistic totest H05 is
(2.27)
which is distributed as a noncentral F distribution based on t – 1 and N – 2t degrees
of freedom The conclusion you make at X = 0 may be different from that you make
at X =x or X = X– 0, where X0 is some other preselected or fixed value of thecovariate Suppose the you want to compare the distances between the regressionlines at a selected value of X, say X = X0 The hypothesis to be tested is
FIGURE 2.3 Graph demonstrating that for nonparallel lines model, the comparisons between
the treatments depend on the value of X.
Covariate
j n
−
Re
Trang 35which is based on d.f.SSRes(H06) = N – t – 1 degrees of freedom Using the principle
of conditional error, the sum of squares due to deviations from H06 is computed asSSH06 = SSRes(H06) – SSRes, which is based on d.f.SSRes(H06) – d.f.SSRes = t – 1 degrees
of freedom The resulting statistic to test H06,
(2.32)
which is distributed as a noncentral F distribution based on t – 1 and N – 2t degrees
of freedom It is important for you to make comparisons among the regression lines
at several different values of the covariate The usual comparison of adjusted means,i.e., at X = x is only one of many comparisons that are probably of interest.–Figure 2.3 shows three possible values of X (covariate) at which you might makecomparisons The statistics for testing H06 with X0 = Xt, X0 =x and X– 0 = X* can
be computed by using Model 2.30 and the sum of squares residual in Equation 2.31for each of the three values of X If you reject a hypothesis corresponding to aselected value of X0, then a multiple comparison procedure could be used to comparethe distances between pairs of regression lines or to make preplanned treatmentcomparisons among the regression lines at X0
The difference between two simple linear regression lines at X = X0, isˆ
µyi β i X = β i X0– ˆµYi′β i′ X = β i′ X0, and the estimated standard error of the difference is
i
t i
1 1
−
Re
Trang 36where the standard errors SˆµYi β i X = β i X0 can be obtained from the standard errors ofthe intercept parameters in Model 2.29 or can be computed as from the linearcombination of the parameter estimates ˆαi + ˆβiX0 to obtain
These estimated standard errors are computed with the assumption that the two modelshave no common parameters so the covariance between the two adjusted means iszero Again, for preplanned comparisons, LSD, Bonferroni, or Scheffe types of mul-tiple comparison procedures can be used to help interpret the results of comparingthe regression lines at each selected value of the covariate For most experiments,comparisons should be made for at least three values of X, one in the lower range,one in the middle range, and one in the upper range of the X’s obtained for theexperiment There are no set rules for selecting the values of X at which to comparethe models, but, depending on your objectives, the lines could be compared at:
1 The mean of the X values, the mean of the X values minus one standarddeviation of the X values and the mean of the X values, plus one standarddeviation of the X values
2 The median of the X values, the 25th percentile of the X values, and the75th percentile of the X values
3 The median of the X values, the γ percentile of the X values, and the 1 – γpercentile of the X values for some choice of γ (such as 01, 05, 10, or 20)You might be interested in determining which treatment mean responds the most
to a change in the value of the covariate In this case an LSD approach can be used
to make size α comparison-wise tests about the linear combinations of the βi’s.Alternatively, you could also use a Fisher’s protected LSD or Bonferroni or Scheffe-type approach to control the experiment-wise error rate In any case, the standarderror of the difference between two slopes is
where Sβ ˆ i denotes the standard error associated with ˆβi and it is assumed the covariancebetween the two slope parameters is zero (which is the case here since the two models
do not have common parameters.) The degrees of freedom for the appropriate age point corresponding to the selected test is N – 2t Preplanned treatment comparisonscan be made by comparing the slopes of the various models Section 2.7 shows how
percent-to carry out the computations via the procedures of the SAS® system and JMP®
Trang 372.5 CONFIDENCE BANDS ABOUT THE DIFFERENCE
OF TWO TREATMENTS
When the slopes are unequal, it is often useful to determine the region of the covariatewhere two treatments produce significantly different responses A confidence bandcan be constructed about the differences of models at several values of X in theregion of the covariate Then the region of the covariate where the confidence banddoes not contain zero is the region where the treatments produce significantlydifferent responses A Scheffe-type confidence statement should be used to provideexperiment-wise error rate protection for each pair of treatments The estimateddifference between the two regression lines for Treatments 1 and 2 at X = X0 is
which has estimated standard error
where The difference between the twomodels is
which is a straight line with two parameters, i.e., the intercept is α1 – α2 and theslope is β1 – β2 Thus to construct a confidence band about the difference of the twomodels at X = X0 for a range of X values based on Scheffe percentage points, use
where the number 2 corresponds to the two parameters in the difference of the twomodels An example of the construction and use of the confidence band about thedifference of two regression lines is presented in Section 3.3
2.6 SUMMARY OF STRATEGIES
Sections 2.3 and 2.4 describe the strategies for determining the form of the model andthe resulting analyses Those strategies are summarized in Table 2.1 which lists thepaths for model determination The first step is to make sure that a straight line is anadequate model to describe the data for each of the treatments If the straight line does
Trang 38not fit the data, then the analysis must not continue until an adequate model is obtained.The strategies described are not all encompassing as there are possible exceptions.There are at least two possible exceptions to the strategy First, it is possible toreject the equal slopes hypothesis and fail to reject the slopes equal to zero hypothesiswhen there are both positive and negative slopes (see Section 3.4) In this case, usethe nonparallel lines model Second, it is possible to fail to reject the slopes equal
to zero hypothesis when in fact the common slope of a parallel lines model issignificantly different from zero Many experiments have a very few observationsper treatment In that case, there is not enough information from the individualtreatments to say the individual slopes are different from zero, but the combining theinformation into a common slope does detect the linear relationship (see Section 3.6)
In this case, use the common slope or parallel lines model There apparently can beother exceptions that can be constructed, but at this moment they are not evident
2.7 ANALYSIS OF COVARIANCE COMPUTATIONS
The SAS® system can be used to compute the various estimators and tests ofhypotheses discussed in the previous sections The SAS® system statements requiredfor each part of the analysis are presented in this section Section 2.7.1 describesthe syntax needed for using PROC GLM or PROC MIXED (through Version 8).Section 2.7.2 describes the process of using JMP® Version 4 Detailed examples arediscussed in Chapter 3
All the models will be fit assuming that the data were read in by the followingstatements:
TABLE 2.1
Strategy for Determining the Form of the Analysis of Covariance Model Involving One Covariate Assuming a Simple Linear Regression Model Will Describe Each Treatment Data
a Test the hypothesis that the slopes are zero:
i If fail to reject, compare the treatments via analysis of variance.
ii If reject go to (b).
b Test the hypothesis that the slopes are equal:
i If fail to reject, use a parallel lines model and compare the treatments by comparing the intercepts or adjusted means (LSMEANS).
ii If reject go to (c).
c Use the unequal slope model and
i Compare the slopes of the treatments to see if treatments can be grouped into groups with equal slopes.
ii Compare the models of at least three values of the covariate, low, middle, and high value iii Construct confidence bands about the difference of selected pairs of models.
Trang 39MODEL Y = TRT X*TRT/NOINT SOLUTION;
The term TRT with the no intercept (NOINT) option generates the part of the designmatrix corresponding to the intercepts and enables one to obtain the estimators of theintercepts The term X*TRT generates the part of the design matrix corresponding tothe slopes The SOLUTION option is used so that the estimators and their standarderrors are printed (PROC GLM and PROC MIXED do not automatically provide theestimators when there is a CLASS variable unless SOLUTION is specified.) The sum
of squares corresponding to ERROR is SSRes of Equation 2.4 and the MEAN SQUAREERROR is ˆσ2, the estimate of the sampling variance For PROC MIXED, the value
of ˆσ2 is obtained from the Residual line from the covariance parameter estimates.The type III sum of squares corresponding to X*TRT tests H01 of Model 2.6,i.e., β1 = β2 = … = βt = 0 given that the unequal intercepts are in the model TheType III sum of squares corresponding to TRT tests H05, i.e., α1 = α2 = … = αt = 0given that the unequal slopes are in the model The intercepts equal to zero hypothesis
is equivalent to testing that all of the treatment regression lines are equal to zero at
X = 0 This hypothesis is often not of interest, but a test is available in case youhave a situation where a zero intercept hypothesis is interpretable
Next, to test H02 of Model 2.11, the required SAS® system statements arePROC GLM; CLASSES TRT;
MODEL Y = TRT X X*TRT/SOLUTION;
The Type III sum of squares corresponding to X*TRT tests H02 The Type IIIsum of squares corresponding to X tests if the average value of the slopes is zeroand the Type III sums of squares corresponding to TRT tests H05 By including Xand/or removing the NOINT option, the model is singular and the provided leastsquares solutions is not directly interpretable The least squares solution satisfies theset-to-zero restrictions (see Chapter 6 of Milliken and Johnson, 1984) If one usesthe model statement Y = TRT X*TRT X, where X*TRT is listed before X, the Type Isum of squares corresponding to X*TRT tests H01 while the type III sum of squarestests H02 A list of Type I and III estimable functions can be obtained and used toverify the hypothesis being tested by each sum of squares
If one fails to reject H02, the parallel lines or equal slope model of Equation 2.12should be fit to the data The appropriate SAS® system statements are
PROC GLM; CLASS TRT;
MODEL Y = TRT X/SOLUTION;
The Type III sum of squares corresponding to TRT is SSH03 of Equation 2.21,and the resulting F ratio tests that the distances between the lines are zero given thatthe parallel line model is adequate to describe the data
Trang 40LSMEANS TRT/STDERR PDIFF;
after the MODEL statement The estimated adjusted means ˆµyi β i X = ˆαi + ˆβi–
x are theestimates of the treatment means at X =x and are called least squares means These–adjusted means are predicted values computed from the respective regression modelsevaluated at X =x The option STDERR provides the corresponding standard errors–
of the adjusted means The PDIFF option provides significance levels for t-tests of
H0: µYi β i X = β i –= µYi′β i′ X = β i′–for each pair of adjusted means (The TDIFF option can
be included and it provides the values of the t-tests from which the PDIFF valuesare obtained.) A comparison of adjusted means is also a comparison of the αi’s forthe parallel lines model The significance probabilities can be used to construct aLSD or a Bonferroni multiple comparison procedure for comparing the distancesbetween pairs of lines
Estimates of the mean of y given X = A for each treatment, which are also calledadjusted means, can be obtained by including the statement
LSMEANS TRT/STDERR PDIFF AT X = A;
(where A is a numerical constant)
after the MODEL statement This provides the adjusted means ˆµYi β i X = β i A = ˆαi + ˆβiA,which are the predicted values from the regression lines at X = A or the estimates
of the treatment means at X = A The PDIFF option provides significance levels fort-tests of H0: µYi β i X = β i A = µYi′β i′ X = β i′ A for each pair of adjusted means
Any comparisons among parameters can be made by using the ESTIMATE orCONTRAST statement in the GLM procedure There are two situations where suchstatements are needed
First, if the conclusion is that the slopes are not equal, then one can apply amultiple comparison procedure in order to compare some or all pairs of slopes This
is easily done by including an ESTIMATE statement following the MODEL ment for each comparison of interest For example, if there are three treatments and
state-it is of interest to compare all pairs of slopes, then the following statements would