Quantitative Factors and Orthogonal Polynomials, 57 Expected Mean Squares and Sample Size Determination, 63 One-Way Random Effects Model, 70 Residual Analysis: Assessment of Model Assump
Trang 1Experiments
Planning, Analysis, and Optimization Second Edition
C F JEFFWU
School of IndustriaJ and Systems Engineering
Georgia Institute of Technology
Atlanta, Georgia
MICHAEL S HAMADA
Los Alamos NationaJ Laboratory
Los Alamos, New Mexico
~WILEY
A JOHN WILEY & SONS, INC., PUBLICATION
Trang 2Copyright © 2009 by John Wiley & Sons, Inc All rights reserved
Published by John Wiley & Sons, Inc Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic mechanical photocopying recording scanning or otherwise except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, with- out either the prior written permission of the Publisher or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center Inc • 222 Rosewood Drive Danvers
MA 01923 (978) 750-8400 fax (978) 750-4470 or on the web at www.copyright.com Requests
to the Publisher for permission should be addressed to the Permissions Department John Wiley & Sons Inc., III River Street Hoboken NJ 07030 (201) 748-6011 fax (201) 748-6008 or online at www.wiley.com/go/permissions
Limit of LiabilitylDisclaimer of Warranty: While the publisher and author have used their best efforts
in preparing this book, they make no representations or warranties with respect to the accuracy
or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose No warranty may be created or extended by sales representatives or written sales materials The advice and strategies contained herein may not be suitable for your situation You should consult with a professional where appropriate Neither the publisher nor the author shall be liable for any loss of profit or any other commercial damages including but not limited to special incidental consequential or other damages
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States
at (317) 572-3993 or fax (317) 572-4002
Wiley also publishes its books in a variety of electronic formats Some content that appears in print may not be available in electronic books For more information about Wiley products, visit our web site at www.wiley.com
UbrQry of COllgress CIIIlI/ogillg-Ul-Publicalioll DIIIlI:
Wu Chien-Fu Jeff
Experiments: planning analysis and optimization I C F Jeff Wu Michael S Hamada 2nd ed
Trang 3To my parents and Jung Hee, Christina, and Alexandra
M S H
To my mother and family
C F J W
Trang 5Contents
1 Basic Concepts for Experimental Design and Introductory
1.1 Introduction and Historical Perspective 1
1.2 A Systematic Approach to the Planning and Implementation
of Experiments 4
1.3 Fundamental Principles: Replication, Randomization,
and Blocking, 8
1.4 Simple Linear Regression, 11
1.5 Testing of Hypothesis and Interval Estimation, 14
1.6 Multiple Linear Regression, 20
1.7 Variable Selection in Regression Analysis, 26
1.8 Analysis of Air Pollution Data 29
Trang 6Quantitative Factors and Orthogonal Polynomials, 57
Expected Mean Squares and Sample Size Determination, 63 One-Way Random Effects Model, 70
Residual Analysis: Assessment of Model Assumptions, 74
Practical Summary, 79
Exercises, 80
References, 86
3.1 Paired Comparison Designs, 87
3.2 Randomized Block Designs, 90
3.3 Two-Way Layout: Factors with Fixed Levels, 94
3.3.1 Two Qualitative Factors: A Regression Modeling
Approach, 97
*3.4 Two-Way Layout: Factors with Random Levels, 99
3.5 Multi-Way Layouts, 108
3.6 Latin Square Designs: Two Blocking Variables, 110
3.7 Graeco-Latin Square Designs, 114
*3.8 Balanced Incomplete Block Designs, 115
4.1 An Epitaxial Layer Growth Experiment, 155
4.2 Full Factorial Designs at Two Levels: A General Discussion, 157 4.3 Factorial Effects and Plots, 161
4.3.1 Main Effects, 162
4.3.2 Interaction Effects, 164
4.4 Using Regression to Compute Factorial Effects, 169
*4.5 ANOVA Treatment of Factorial Effects, 171
4.6 Fundamental Principles for Factorial Effects: Effect Hierarchy, Effect Sparsity, and Effect Heredity, 172
Trang 7CONTENTS ix 4.7 Comparisons with the "One-Factor-at-a-Time" Approach, 173 4.8" Normal and Half-Normal Plots for Judging Effect
4.11 Use of Log Sample Variance for Dispersion Analysis, 184
4.12 Analysis of Location and Dispersion: Revisiting the Epitaxial Layer Growth Experiment, 185
*4.13 Test of Variance Homogeneity and Pooled Estimate of
Variance, 188
*4.14 Studentized Maximum Modulus Test: Testing Effect Significance
for Experiments with Variance Estimates, 190
4.15 Blocking and Optimal Arrangement of 2k Factorial Designs in 2 q
5 Fractional Factorial Experiments at Two Levels
5.1 A Leaf Spring Experiment, 2 I 1
211
5.2 Fractional Factorial Designs: Effect Aliasing and the Criteria of Resoluti<:>n and Minimum Aberration, 213
5.3 Analysis of Fractional Factorial Experiments, 219
5.4 Techniques for Resolving the Ambiguities in Aliased Effects, 225 5.4.1 Fold-Over Technique for Follow-Up Experiments, 225 5.4.2 Optimal Design Approach for Follow-Up
Experiments, 229 5.5 Selection of 2',-p Designs Using Minimum Aberration and Related Criteria, 234
5.6 Blocking in Fractional Factorial Designs, 238
5.7 Practical Summary, 240
Exercises, 242
Appendix 5A: Tables of 2 k- p Fractional Factorial Designs, 252
Appendix 5B: Tables of 2 k- p Fractional Factorial Designs in 2 q
Blocks, 260
References, 264
Trang 8x CONTENTS
6 Full Factorial and Fractional Factorial Experiments at Three
6.1 A Seat-Belt Experiment, 267
6.2 Larger-the-Better and Smaller-the-Better Problems, 268
6.3 3 k Full Factorial Designs, 270
6.4 3 k - p Fractional Factorial Designs, 275
6.5 Simple Analysis Methods: Plots and Analysis of Variance, 279 6.6 An Alternative Analysis Method, 287
6.7 Analysis Strategies for Multiple Responses I: Out-of-Spec
Probabilities, 293
6.8 Blocking in 3 k and 3 k - p Designs, 302
6.9 Practical Summary, 303
Exercises, 305
Appendix 6A: Tables of 3 k- p Fractional Factorial Designs, 312
Appendix 6B: Tables of 3 k- p Fractional Factorial Designs in 3 q
Blocks, 313
References, 317
7 Other Design and Analysis Techniques for Experiments at
7.1 A Router Bit Experiment Based on a Mixed Two-Level and Four-Level Design, 319
7.2 Method of Replacement and Construction of 2 ln 4" Designs, 322 7.3 Minimum Aberration 2 ln 4" Designs with n = 1,2, 325
7.4 An Analysis Strategy for 2 111 4" Experiments, 328
7.5 Analysis of the Router Bit Experiment, 330
7.6 A Paint Experiment Based on a Mixed 1\vo-Level and Three-Level Design, 334
7.7 Design and Analysis of 36-Run Experiments at 1\vo and Three Levels, 334
7.8 r k- p Fractional Factorial Designs for any Prime Number T, 341
7.8.1 25-Run Fractional Factorial Designs at Five Levels, 342 7.8.2 49-Run Fractional Factorial Designs at Seven Levels, 345 7.8.3 General Construction, 345
*7.9 Related Factors: Method of Sliding Levels, Nested Effects
Analysis, and Response Surface Modeling, 346
7.9.1 Nested Effects Modeling, 348
7.9.2 Analysis of Light Bulb Experiment, 350
7.9.3 Response Surface Modeling, 353
Trang 9CONTENTS xi
7.9.4 Symmetric and Asymmetric Relationships Between
Related Factors, 355 7.10 Practical Summary, 356
Exercises, 357
Appendix 7A: Tables of 21H41 Minimum Aberration Designs, 364
Appendix 7B: Tables of 21114 2 Minimum Aberration Designs, 366
Appendix 7C: OA(25, 56), 368
Appendix 70: OA(49, 78), 368
References, 370
8 Nonregular Designs: Construction and Properties
8.1 Two Experiments: Weld-Repaired Castings and Blood Glucose Testing, 371
371
8.2 Some Advantages of Nonregular Designs Over the 2k- p and 3"-P
Series of Designs, 373
8.3 A Lemma on Orthogonal Arrays, 374
8.4 Plackett-Burman Designs and Hall's Designs, 375
8.5 A Collection 6f Useful Mixed-Level Orthogonal Arrays, 379
*8.6 Construction of Mixed-Level Orthogonal Arrays Based on
Appendix 80: Some Useful Difference Matrices, 416
Appendix 8E: Some Useful Orthogonal Main-Effect Plans, 418
References 419
9.1 Partial Aliasing of Effects and the Alias Matrix 421
9.2 Traditional Analysis Strategy: Screening Design and Main Effect Analysis, 424
9.3 Simplification of Complex Aliasing via Effect Sparsity, 424
Trang 10CONTENTS
9.4 An Analysis Strategy for Designs with Complex Aliasing, 426
*9.5
9.4.1 Some Limitations, 432
A Bayesian Variable Selection Strategy for Designs
with Complex Aliasing, 433
9.5.1 Bayesian Model Priors, 435
9.5.2 Gibbs Sampling, 437
9.5.3 Choice of Prior Tuning Constants, 438
9.5.4 Blood Glucose Experiment Revisited, 439
10 Response Surface Methodology
10.1 A Ranitidine Separation Experiment, 459
10.2 Sequential Nature of Response Surface
Methodology, 461
10.3 From First-Order Experiments to Second-Order Experiments: Steepest Ascent Search and Rectangular Grid Search, 464
10.3.1 Curvature Check, 465
10.3.2 Steepest Ascent Search, 466
10.3.3 Rectangular Grid Search, 470
10.4 Analysis of Second-Order Response Surfaces, 473
10.4.1 Ridge Systems, 475
10.5 Analysis of the Ranitidine Experiment, 477
10.6 Analysis Strategies for Multiple Responses IT: Contour Plots and the Use of Desirability Functions, 481
10.7 Central Composite Designs, 484
10.8 Box-Behnken Designs and Uniform Shell Designs, 489
10.9 Practical Summary, 492
Exercises, 494
Appendix lOA: Thble of Central Composite Designs, 505
Appendix lOB: Table of Box-Behnken Designs, 507
Appendix IOC: Thble of Uniform Shell Designs, 508
References, 509
459
Trang 11coNTENTS
11 Introduction to Robust Parameter Design
11.1 A Robust Parameter Design Perspective of the Layer Growth and Leaf Spring Experiments, 511
11.1.1 Layer Growth Experiment Revisited, 511
11.1.2 Leaf Spring Experiment Revisited, 512
11.2 Strategies for Reducing Variation, 514
11.3 Noise (Hard-to-Control) Factors, 516
11.4 Variation Reduction Through Robust Parameter Design, 518 11.5 Expetimentation and Modeling Strategies I:
"'11.8.1 Compound Noise Factor, 542
11.9 Signal-to-Noise Ratio and Its Limitations for Parameter Design Optimization, 543
11.9.1 SN Ratio Analysis of Layer Growth Experiment, 546
12.1 An Injection Molding Experiment, 563
12.2 Signal-Response Systems and Their Classification, 565
12.2.1 Calibration of Measurement Systems, 570
12.3 Performance Measures for Parameter Design
Optimization, 571
12.4 Modeling and Analysis Strategies, 575
12.5 Analysis of the Injection Molding Experiment, 577
Trang 12xiv CONTENTS
13.1 Experiments with Failure TIme Data, 599
13.1.1 Light Experiment, 599
13.1.2 Thermostat Experiment, 600
13.1.3 Drill Bit Experiment, 600
13.2 Regression Model for Failure Time Data, 604
13.3 A Likelihood Approach for Handling Failure TIme Data with Censoring, 605
13.3.1 Estimability Problem with MLEs, 608
13.4 Design-Dependent Model Selection Strategies, 609
13.5 A Bayesian Approach to Estimation and Model Selection for Failure TIme Data, 610
13.6 Analysis of Reliability Experiments with Failure Time Data, 613 13.6.1 Analysis of Light Experiment, 613
13.6.2 Analysis of Thermostat Experiment, 614
13.6.3 Analysis of Dlill Bit Experiment, 615
13.7 Other Types of Reliability Data, 617
13.8 Practical Summary, 618
Exercises, 619
References, 623
14 Analysis of Experiments with Nonnormal Data
14.1 A Wave Soldering Experiment with Count Data, 625
14.2 Generalized Linear Models, 627
14.2.1 The Distribution of the Response, 627
14.2.2 The Form of the Systematic Effects, 629
14.2.3 GLM versus Transforming the Response, 630
14.3 Likelihood-Based Analysis of Generalized Linear Models, 631 14.4 Likelihood-Based Analysis of the Wave Soldering
Experiment, 634
14.5 Bayesian Analysis of Generalized Linear Models, 635
14.6 Bayesian Analysis of the Wave Soldering Experiment, 637
14.7 Other Uses and Extensions of Generalized Linear Models and Regression Models for Nonnormal Data, 639
*14.8 Modeling and Analysis for Ordinal Data, 639
14.8.1 The Gibbs Sampler for Ordinal Data, 642
*14.9 Analysis of Foam Molding Experiment, 644
14.10 Scoring: A Simple Method for Analyzing Ordinal Data, 647
625
Trang 13CONTENTS xv
14.11 Practical Summary, 649
Exercises, 649
References, 661
Appendix A Upper Tail Probabilities of the Standard Normal
Distribution ' z 1,00 -it;e-21r 1I2/ 2 du 663
Appendix B Upper Percentiles of the t Distribution 665
Appendix C Upper Percentiles of the X 2 Distribution 667
'"
AppendixD Upper Percentiles of the F Distribution 669
AppendixE Upper Percentiles of the Studentized Range
Appendix F Upper Percentiles of the Studentized Maximum
AppendixG Coefficients of Orthogonal Contrast Vectors 699
Trang 15Preface to the Second Edition
Nearly a decade has passed since the publication of the first edition Many instructors have used the first edition to teach master's and Ph.D students Based
on their feedback and our own teaching experience, it became clear that we needed to revise the book to make it more accessible to a larger audience, includ- ing upper-level undergraduates To this end, we have expanded and reorganized the early chapters in the second edition For example, our book now provides
a self-contained presentation of regression analysis (Sections 1.4-1.8), which prepares those students who have not previously taken a regression course We have found that such a foundation is needed because most of the data analyses in this book are based on regression or regression-like models Consequently, this additional material will make it easier to adopt the book for courses that do not have a regression analysis prerequisite
In the early chapters, we have provided more explanation and details to ify the calculations required in the various data analyses considered The ideas, derivations, and data analysis illustrations are presented at a more leisurely pace
clar-than in the first edition For example, first edition Chapter 1 has been expanded into second edition Chapters 1 and 2 Consequently, second edition Chapters 3-14 correspond to first edition Chapters 2-13 We have also reorganized second edition Chapters 3-5 in a more logical order to teach with For example, anal-ysis methods for location effects that focus on the mean response are presented first and separated from analysis methods for dispersion effects that consider the response variance This allows instructors to skip the material on dispersion anal-ysis if it suits the needs of their classes In this edition, we have also removed material that did not fit in the chapters and corrected errors in a few calculations and plots
To aid the reader, we have mat'ked more difficult sections and exercises by a
"*"; they can be skipped unless the reader is particularly interested in the topic Note that the starred sections and exercises are more difficult than those in the same chapter and are not necessarily more difficult than those in other chapters
xvii
Trang 16xviii PREFACE TO THE SECOND EDmON
The second edition presents a number of new topics, which include:
• expected mean squares and sample size determination (Section 2.4),
• one-way ANOVA with random effects (Section 2.5),
• two-way ANOVA with random effects (Section 3.4) with application to measurement system assessment,
• split-plot designs (Section 3.9),
• ANOVA treatment of factorial effects (Section 4.5) to bridge the main ysis method of Chapters 1-3 with the factorial effect based analysis method
anal-in Chapter 4,
• a response surface modeling method for related factors (Section 7.9.3), which allows expanded prediction capability for two related factors that are both quantitative,
• more details on the Method IT frequentist analysis method for analyzing experiments with complex aliasing (Section 9.4),
• more discussion of the use of compound noise factors in robust parameter design (Section 11.8.1),
• more discussion and illustration of Bayesian approach to analyzing GLMs and other models for nonnormal data (Sections 14.5-14.6)
In addition, ANOVA derivations are given in Section 3.8 on balanced incomplete block designs, and more details are given on optimal designs in Section 5.4.2 In this edition, we have also rewritten extensively, updated the references throughout the book, and have sparingly pointed the reader to some recent and important papers in the literature on various topics in the later chapters All data sets, sample lecture notes, and a sample syllabus can be accessed on the book's FTP site:
ftp:/lftp.wiley.comlpubliclscLtech medlexperiments-planning! Solutions to selected exercises are available to instructors from the authors The preparation of this edition has benefited from the comments and assis-tance of many colleagues and former students, including Nagesh Adiga, Derek Bingham, Ying Hung V Roshan Joseph, Lulu Kang, Rahul Mukerjee, Peter Z Qian Matthias Tan, Huizhi Xie, Kenny Qian Ye, and Yu Zhu Tirthankar Das-gupta played a major role in the preparation and writing of new sections in the early chapters; Xinwei Deng provided meticulous support throughout the prepa-ration of the manuscript We are grateful to all of them Without their support and interest this revision could not have been completed
Atlanta, Georgia
Los Alamos, New Mexico
June 2009
C F JEFF Wu MICHAEL S HAMADA
Trang 17Preface to the First Edition
'"
(Note that the chapter numbering used below refers to first edition chapters.)
Statistical experimental design and analysis is an indispensable tool for menters and one of the core topics in a statistics curriculum Because of its impor-tance in the development of modem statistics, many textbooks and several classics have been written on the subject, including the influential 1978 book Statistics for Experimenters by Box, Hunter, and Hunter There have been many new method-ological developments since 1978 and thus are not covered in standard texts The writing of this book was motivated in part by the desire to make these modem ideas and methods accessible to a larger readership in a reader friendly fashion Among the new methodologies, robust parameter design stands out as an innovative statisticaVengineering approach to off-line quality and productivity improvement It attempts to improve a process or product by making it less sensitive to noise variation through statistically designed experiments Another important development in theoretical experimental design is the widespread use
experi-of the minimum aberration criterion for optimal assignment experi-of factors to columns
of a design table This criterion is more powelful than the maximum resolution criterion for choosing fractional factorial designs The third development is the increasing use of designs with complex aliasing in conducting economical exper-iments It turns out that many of these designs can be used for the estimation
of interactions, which is contrary to the prevailing practice that they be used for estimating the main effects only The fourth development is the widespread use of Generalized Linear Models (GLMs) and Bayesian methods for analyz-ing nonnOlmal data Many experimental responses are nonnormally distributed, such as binomial and Poisson counts as well as ordinal frequencies, or have lifetime distributions and are observed with censoring that arises in reliability and survival snldies With the advent of modem computing, these tools have been incorporated in texts on medical statistics and social statistics They should also be made available to experimenters in science and engineering There are also other experimental methodologies that originated more than 20 years ago but have received scant attention in standard application-oriented texts These include mixed two- and four-level designs, the method of collapsing for generating
xix
Trang 18xx PREFACE TO THE FIRST EDITION
orthogonal main-effect plans, Plackett-Burman designs, and mixed-level onal wTays The main goal of writing this book is to fill in these gaps and present
orthog-a new orthog-and integrorthog-ated system of expelimentorthog-al design orthog-and orthog-anorthog-alysis, which morthog-ay help
in defining a new fashion of teaching wld for conducting research on this subject The intended readership of this book includes general practitioners as well
as specialists As a textbook, it covers standard material like analysis of ance (ANOVA), two- and three-level factorial and fractional factorial designs and response surface methodologies For reading most of the book, the only prerequi-site is an undergraduate level course on statistical methods and a basic knowledge
vari-of regression analysis Because vari-of the multitude vari-of topics covered in the book, it can be used for a variety of courses The material contained here has been taught
at the Department of Statistics and the Department of Industrial and Operations Engineering at the University of Michigan to undergraduate seniors, master's, and doctoral students To help instructors choose which material to use from the book, a separate "Suggestions of Topics for Instructors" follows this preface Some highlights and new material in the book are outlined as follows Chapters
1 and 2 contain standwu material on analysis of variance, one-way and multi-way layout, randomized block designs, Latin squares, balanced incomplete block designs, and analysis of covariance Chapter 3 addresses two-level factorial designs and provides new material in Sections 3.13-3.17 on the use of for-mal tests of effect significance in addition to the informal tests based on normal and half-normal plots Chapter 4, on two-level fractional factorial designs, uses the minimum aben·ation cliterion for selecting optimal fractions and emphasizes the use of follow-up experiments to resolve the ambiguities in aliased effects In Chapter 5, which deals with three-level designs, the linear-quadratic system and the variable selection sb·ategy for handling and analyzing interaction effects are new A new strategy for handling multiple responses is also presented Most of the material in Chapter 6 on mixed two- and four-level designs and the method
of sliding levels is new Chapter 7, on nonregular designs, is the only theoretical chapter in the book It emphasizes statistical properties and applications of the designs rather than their construction and mathematical structure For practition-ers, only the collections of tables in its appendices and some discussions in the sections on their statistical properties may be of interest Chapter 7 paves the way for the new material in Chapter 8 Both frequentist and Bayesian analysis strategies are presented The latter employs Gibbs sampling for efficient model search Supersaturated designs are also briefly discussed Chapter 9 contains a standard treatment of response surface methodologies Chapters 10 and 11 present robust paranleter design The former deals with problems with a simple response while the latter deals with those with a signal-response relationship The three important aspects of parameter design are considered: choice of performance measures, planning techniques, and modeling and analysis strategies Chapter 12
is concerned with experiments for reliability improvement Both failure time data and degradation data are considered Chapter 13 is concerned with experi-ments with nonnonnal responses Several approaches to analysis are considered, including generalized linear models and Bayesian methods
Trang 19PREFACE TO THE FIRST EDITION xxi
The book has some interesting features not commonly found in experimental design texts Each of Chapters 3 to 13 starts with one or more case studies, which include the goal of the investigation, the data, the experimental plan, and the factors and their levels It is then followed by sections devoted to the description of experimental plans (i.e., experimental designs) Required theory
or methodology for the experimental designs are developed in these sections They are followed by sections on modeling and analysis strategies The chapter then returns to the original data, analyzes it using the strategies just outlined, and discusses the implications of the analysis results to the original case studies The book contains more than 80 experiments, mostly based on actual case studies; of these, 30 sets are analyzed in the text and more than 50 are given in the exercises Each chapter ends with a practical summary which provides an easy guide to the methods covered in that chapter and is particularly useful for readers who want
to find a specific tool but do not have the patience to go through the whole chapter The book takes a novel approach to design tables Many tables are new and based on recent research in expelimental design theory and algorithms For regular designs, only the design generators are given Full designs can be easily generated by the readers from these generators The collections of clear effects are given in these tables, however, because it would require some effort, especially for the less mathematically oriented readers, to derive them The complete layouts
of the orthogonal arrays are given in Chapter 8 for the convenience of the readers With our emphasis on methodologies and applications, mathematical derivations are given sparingly Unless the derivation itself is crucial to the understanding
of the methodology, we omit it and refer to the original source
The majority of the writing of this book was done at the University of Michigan Most of the authors' research that is cited in the book was done
at the University of Michigan with support from the National Science tion (1994-1999) and at the University of Waterloo (1988-1995) with support from the Natural Sciences and Engineering Research Council of Canada and the GMINSERC Chair in Quality and Productivity We have benefited from the comments and assistance of many col1eagues and fonner students, including Julie Berube, Derek Bingham, Ching-Shui Cheng, Hugh Chipman, David Fen-scik, Xiaoli Hou, Longcheen Huwang, Bill Meeker, Randy Sitter, Huaiqing Wu, Hongquan Xu, Qian Ye, Runchu Zhang, and Yu Zhu Shao-Wei Cheng played a pivotal supporting role as the book was completed; Jock MacKay read the first draft of the entire book and made numerous penetrating and critical comments; Jung-Chao Wang provided invaluable assistance in the preparation of tables for Chapter 8 We are grateful to all of them Without their efforts and interest, this book could not have been completed
Founda-Ann Arbor Michigan
Los Alamos New Mexico
C F JEFF Wu
MICHAEL S HAMADA
Trang 21Suggestions of Topics for Instructors
One term for senior and master's students in Statistics, Engineering, Physical, Life and Social Sciences (with no background in regression analysis):
Chapters 1,2,3 (some of 3.4, 3.8, 3.9, 3.11 can be skipped), 4 (except 4.5, 4.13, 4.14),5; optional material from Chapters 11 (11.1-11.5, part of 11.6-11.9),
6 (6.1-6.6), 8 (8.1-8.5), 9 (9.1-9.4), 10 (10.1-10.3, 10.5, 10.7) For dents with a background in regression analysis, Sections 1.4-1.8 can be skipped or briefly reviewed
stu-One term for a joint master's and Ph.D course in StatisticslBiostatistics:
Chapters 1 (1.1-1.3), 2 (except 2.4), 3 (3.4 and 3.9 may be skipped), 4 (except 4.13-4.14), 5, 6 (6.7-6.8 may be skipped), 10 (10.4 and 10.8 may be skipped), 11 (11.1-11.6, part of 11.7-11.9); optional material from Chapters 7 (7.1-7.5, 7.9), 8 (8.1-8.5), 9 (except 9.5) Coverage of Chapters 1 to 3 can be accelerated for those with a background in ANOVA Two-term sequence for master's and Ph.D students in StatisticslBiostatistics:
First term: Chapters 1 to 3 (can be accelerated if ANOVA is a prerequisite), Chapters 4 (4.13-4.14 may be skipped), 5, 6, 7
Second term: Chapters 8 (the more theoretical material may be skipped), 9 (9.5 may be skipped), 10 (10.8 may be skipped), 11, 12 (12.6 may be skipped),
13 (13.5 may be skipped), 14 (14.5-14.6, 14.8-14.9 may be skipped)
One-term advanced topics course for Ph.D students with background in ductory graduate experimental design course:
intro-Selected topics from Chapters 5 to 14 depending on the interest and back ground of the students
xxiii
Trang 22xxiv SUGGESTIONS OF TOPICS R>R INSTRUcroRS One-term course on theoretical experimental design for Ph.D students in Statis-tics and Mathematics:
Sections 1.3, 3.6-3.9,4.2-4.3,4.6-4.7,4.15, 5.2-5.6, 6.3-6.4, 6.8, 7.2-7.3, 7.7-7.8, Chapter 8,9.6, 10.4, 10.7-10.8, 11.6-11.8, 12.6
Trang 23• List of Experiments and Data Sets
Brain and Body Weight Data 38
Long Jump Data 39
Ericksen Data 41
Gasoline Consumption Data 43
Reflectance Data, Pulp Experiment 46
Strength Data, Composite Experiment 57
Adapted Muzzle Velocity Data 83
Summary Data, Airsprayer Experiment 84
Packing Machine Data 84
Blood Pressure Data 85
Residual Chlorine Readings, Sewage Experiment 88
Strength Data, Girder Experiment 91
Torque Data, Bolt Experiment 94
Sensory Data, Measurement System Assessment Study 101 Weight Loss Data, Wear Experiment III
Wear Data, Tire Experiment 116
Water Resistance Data, Wood Experiment 122
Data Starch Experiment 129
Design Matrix and Response Data, Drill Experiment 135 Strength Data, Original Girder Experiment 139
xxv
Trang 24Strength Data, Revised Composite Experiment 141
Yield Data, Tomato Experiment 141
Worsted Yarn Data 142
Data, Resistor Experiment 144
Data, Blood Pressure Experiment 145
Throughput Data 146
Muzzle Velocity Data 147
Corrosion Resistances of Steel Bars, Steel Experiment 148 Data, Thickness Gauge Study 149
Task Efficiency ExpeIiment 204
Design Matrix and Roughness Data, Drive Shaft
Experiment 205
Metal Alloy Crack Experiment 206
Design Matrix and Free Height Data, Leaf Spring
Design Matrix and Response Data, Ultrasonic Bonding
Experiment 308
Trang 25LIST OF EXPERIMENTS AND DATA SETS XXVB
Thickness Data, Paint Experiment 336
Design Matrix and Covariates, Light Bulb Experiment 351 Appearance Data, Light Bulb Experiment 352
Design Matrix and Response Data, Reel Motor
A 10-Factor 12-Run Experiment with Six Added Runs 393
Design Mabix and Response Data, Plackett-Burman Design Example Experiment 433
Supersaturated Design Matrix and Adhesion Data, Epoxy Experiment 443
Original Epoxy Experiment Based on 28-Run Plackett-Burman Design 444
Design Matrix and Lifetime Data, Heat Exchanger
Experiment 448
Design Matrix, Window Forming Experiment 449
Pre-Etch Line-Width Data, Window Forming
Experiment 450
Post-Etch Line-Width Data, Window Forming Experiment 451 Design Matrix and Strength Data, Ceramics Experiment 452 Design Matrix and Response Data, Wood Pulp Experiment 453
Trang 26xxviii UST OF EXPERIMENTS AND DATA SETS CHAPTER 10
Design Matrix and Response Data, Final Second-Order
Ranitidine Experiment 479
Runs Along the First Steepest Ascent Direction 496
Central Composite Design 496
Design Matrix and Response Data, Amphetamine
Experiment 497
Design Matrix and Response Data, Whiteware Expetiment 498 Design Matrix and Response Data, Drill Experiment 499 Design Matrix and Response Data, Ammonia Experiment 500 Design Matrix and Response Data, TAB Laser Experiment 501 Design Matrix and Response Data, Cement Experiment 502 Design Matrix, Bulking Process Experiment 503
Cross Array and Thickness Data, Layer Growth
Control Array, Injection Molding Experiment 565
Response Data, Injection Molding Experiment 566
Design Matrix and Weight Data, Coating Experiment 590 Control Array, Drive Shaft Experiment 591
Response Data, Drive Shaft Experiment 592
Control Array, Surface Machining Experiment 593
Single Array for Signal and Noise Factors, Surface Machining Experiment 593
Response Data, Surface Machining Experiment 594
Trang 27UST OF EXPERIMENTS AND DATA SETS
Thble 12.20 Control AlTaY and Fitted Parameters, Engine Idling
Failure TIme Data, Thermostat Experiment 602
Cross AIray and Failure TIme Data (with Censoring TIme of 3000), Drill Bit Experiment 603
Design Matrix and Failure TIme Data, Ball Bearing
Table 14.14 Poppy Counts, Weed Infestation Experiment 657
Table 14.15 Larvae Counts, Larvae Control Experiment 657
Table 14.16 Unsuccessful Germination Counts (Out of 50 Seeds), Wheat
Experiment 658
Table 14.17 Window Size Data, Window Forming Experiment 659
Table 14.18 Design and Max Peel Strength Data, Sealing Process
Experiment 660
Trang 29CHAPTER!
Basic Concepts for Experimental
Design and Introductory Regression Analysis
Some basic concepts and principles in experimental design are introduced in thil chapter, including the fundamental principles of replication, randomization, anc blocking A brief and self-contained introduction to regression analysis is alsc included Commonly used techniques like simple and multiple linear regression least squares estimation, and variable selection are covered
1.1 INTRODUCTION AND HISTORICAL PERSPECTIVE
Experimentation is one of the most common activities that people engage in I covers a wide range of applications from household activities like food prepa· ration to technological innovation in material science, semiconductors, robotics life science, and so on It allows an investigator to find out what happens to th~
output or response when the settings of the input variables in a system are pur· posely changed Statistical or often simple graphical analysis can then be used t( study the relationship between the input and output values A better understand· ing of how the input variables affect the pelformance of a system can thereb)
be achieved This gain in knowledge provides a basis for selecting optimulT input settings Experimental design is a body of knowledge and techniques thai enables an investigator to conduct better experiments, analyze data efficiently and make the connections between the conclusions from the analysis and th~
original objectives of the investigation
Experimentation is used to understand and/or improve a system A systerr can be a product or process A product can be one developed in engineering biology, or the physical sciences A process can be a manufacturing process, f
Experiments, Second Edition By C F Jeff Wu and Michael S Hamada
Convright <a 100Q Inhn Wilp.v Rt ~nn~_ Tn,,_
Trang 302 BASIC DESIGN CONCEPTS AND REGRESSION ANALYSIS
process that describes a physical phenomenon, or a nonphysical process such as those found in service or administration Although most examples in the book are from engineering or the physical and biological sciences, the methods can also be applied to other disciplines, such as business, medicine, and psychology For example, in studying the efficiency and cost of a payroll operation, the entire payroll operation can be viewed as a process with key input vaIiables such as the number of supervisors, the number of clerks, method of bank deposit, level
of automation, administrative structure, and so on A computer simulation model can then be used to study the effects of changing these input variables on cost and efficiency
Modem experimental design dates back to the pioneeIing work of the great statistician R A Fisher in the 1930s at the Rothamsted Agricultural Experimental Station in the United Kingdom Fisher's work and the notable contributions by F Yates and D 1 Finney were motivated by problems in agriculture and biology Because of the nature of agricultural experiments, they tend to be large in scale, take a long time to complete, and must cope with variations in the field Such considerations led to the development of blocking, randomization, replication, orthogonality, and the use of analysis of variance and fractional factorial designs The theory of combinatorial designs, to which R C Bose has made fundamental contributions, was also stimulated by problems in block designs and fractional factorial designs The work in this era also found applications in social science research and in the textile and woolen industries
The next era of rapid development came soon after World War II In ing to apply previous techniques to solve problems in the chemical industries,
attempt-G E P Box and co-workers at Imperial Chemical Industries discovered that new techniques and concepts had to be developed to cope with the unique fea-tures of process industries The new techniques focused on process modeling and optimization rather than on treatment comparisons, which was the primary objective in agricultural experiments The experiments in process industries tend
to take less time but put a premium on run size economy because of the cost of experimentation These time and cost factors naturally favor sequential experi-mentation The same considerations led to the development of new techniques for experimental planning, notably central composite designs and optimal designs The analysis for these designs relies more heavily on regression modeling and graphical analysis Process optimization based on the fitted model is also empha-sized Because the choice of design is often linked to a particular model (e.g., a second-order central composite design for a second-order regression model) and the experimental region may be in-egularly shaped, a flexible strategy for finding designs to suit a particular model and/or experimental region is called for With the availability of fast computational algorithms, optimal designs (which was pioneered by 1 Kiefer) have become an important part of this strategy
The relatively recent emphasis on variation reduction has provided a new source of inspiration and techniques in experimental design In manufacturing, the ability to make many parts with few defects is a competitive advantage Therefore variation reduction in the quality characteristics of these parts has become a
Trang 31INTRODUCTION AND HISTORICAL PERSPECTIVE 3
major focus of quality and productivity improvement G Taguchi advocated the use of robust parameter design to improve a system (i.e., a product or process)
by making it less sensitive to variation, which is hard to control during normal operating or use conditions of the product or process The input variables of a system can be divided into two broad types: control factors, whose values remain fixed once they are chosen, and noise factors, which are hard to control during normal conditions By exploiting the interactions between the control and noise factors, one can achieve robustness by choosing control factor settings that make the system less sensitive to noise variation This is the motivation behind the new paradigm in experimental design, namely, modeling and reduction of variation Traditionally, when the mean and Valiance are both considered, variance is used
to assess the variability of the sample mean as with the t test or of the treatment comparisons as with the analysis of variance The focus on variation and the division of factors into two types led to the development of new concepts and techniques in the planning and analysis of robust parameter design experiments The original problem formulation and some basic concepts were developed by G Taguchi Other basic concepts and many sound statistical techniques have been developed by statisticians since the mid-1980s
Given this historical background, we now classify expelimental problems into five broad categories according to their objectives
1 Treahnent Comparisons The main purpose is to compare several treatments
and select the best ones For example, in the comparison of six barley varieties, are they different in terms of yield and resistance to drought? If they are indeed different, how are they different and which are the best? Examples of treat-ments include varieties (rice, barley, com, etc.) in agricultural trials, sitting posi-tions in ergonomic studies, instructional methods, machine types, suppliers, and
so on
2 Variable Screening If there is a large number of variables in a system but only a relatively small number of them is important, a screening experiment can
be conducted to identify the important variables Such an experiment tends to
be economical in that it has few degrees of freedom left for estimating error variance and higher-order terms like quadratic effects or interactions Once the important variables are identified, a follow-up experiment can be conducted to study their effects more thoroughly This latter phase of the study falls into the category discussed next
3 Response Surface Exploration Once a smaller number of variables is tified as important, their effects on the response need to be explored TIle rela-tionship between the response and these variables is sometimes referred to as a response surface Usually the experiment is based on a design that allows the lin-ear and quadratic effects of the variables and some of the interactions between the variables to be estimated This experiment tends to be larger (relative to the num-ber of variables under study) than the screening experiment Both parametric and semiparametric models may be considered The latter is more computer-intensive but also more flexible in model fitting
Trang 32iden-4 BASIC DESIGN CONCEPTS AND REGRESSION ANALYSIS
4 System Optimization In many investigations, interest lies in the tion of the system For example, the throughput of an assembly plant or the yield
optimiza-of a chemical process is to be maximized; the amount optimiza-of scrap or number optimiza-of reworked pieces in a stamping operation is to be minimized; or the time required
to process a travel claim reimbursement is to be reduced If a response surface has been identified, it can be used for optimization For the purpose of finding
an optimum, it is, however, not necessary to map out the whole surface as in a response surface exploration An intelligent sequential strategy can quickly move the experiment to a region containing the optimum settings of the variables Only within this region is a thorough exploration of the response surface warranted
5 System Robustness Besides optimizing the response, it is important in ity improvement to make the system robust against noise (i.e., hard-to-control) variation This is often achieved by choosing control factor settings at which the system is less sensitive to noise variation Even though the noise variation is hard to control in normal conditions, it needs to be systematically varied during experimentation The response in the statistical analysis is often the variance (or its transformation) among the noise replicates for a given control factor setting
qual-1.2 A SYSTEMATIC APPROACH TO THE PLANNING
AND IMPLEMENTATION OF EXPERIMENTS
In this section, we provide some guidelines on the planning and implementation
of experiments The following seven-step procedure summarizes the important steps that the experimenter must address
I State Objective The objective of the experiment needs to be clearly stated All stakeholders should provide input For example, for a manufactured product, the stakeholders may include design engineers who design the product, process engineers who design the manufacturing process, line engineers who run the man-ufacturing process, suppliers, lineworkers, customers, marketers, and managers
2 Choose Response The response is the experimental outcome or tion There may be multiple responses in an expeliment Several issues arise in
observa-choosing a response Responses may be discrete or continuous Discrete responses
can be counts or categories-for example, binary (good, bad) or ordinal (easy, normal, hard) Continuous responses are generally preferable For example, a continuous force measurement for opening a door is better than an ordinal (easy, normal, hard to open) judgment; the recording of a continuous characteristic is preferred to the recording of the percent that the characteristic is within its speci-fications Trade-offs may need to be made For example, an ordinal measurement
of force to open a door may be preferable to delaying the experiment until a device to take continuous measurements can be developed Most importantly, there should be a good measurement system for measuting the response In fact,
an experiment called a gauge repeatability and reproducibility (R&R) study can
be performed to assess a continuous measurement system (AIAG, 1990) When
Trang 33PLANNING AND IMPLEMENTATION OF EXPERIMENTS 5 there is a single measuring device, the variation due to the measurement system can be divided into two types: variation between the operators and variation within the operators Ideally, there should be no between-operator valiation and small within-operator variation The gauge R&R study provides estimates for these two components of measurement system variation Finally, the response should be chosen to increase understanding of mechanisms and physical laws involved in the problem For example, in a process that is producing under-weight soap bars, soap bar weight is the obvious choice for the response in
an experiment to improve the underweight problem By examining the process more closely, there are two subprocesses that have a direct bearing on soap bar weight: the mixing process that affects the soap bar density and the forming pro-cesS that impacts the dimensions of the soap bars In order to better understand the mechanism that causes the underweight problem, soap bar density and soap bar dimensions are chosen as the responses Even though soap bar weight is not used as a response, it can be easily determined from its density and dimensions Therefore, no information is lost in studying the density and dimensions Such a study may reveal new information about the mixing and forming subprocesses, which can in turn lead to a better understanding of the underweight problem Further discussions on and other examples of the choice of responses can be found in Phadke (1989) and Leon, Shoemaker, and Tsui (1993)
The chosen responses can be classified according to the stated Objective Three broad categories will be considered in this book: nominal-the-best, larger-the-better, and smaller-the-better The first one will be addressed in Section 4.10, and the last two will be discussed in Section 6.2
3 Choose Factors and Levels A factor is a variable that is studied in the experiment In order to study the effect of a factor on the response, two or more values of the factor are used These values are referred to as levels or settings
A treatment is a combination of factor levels When there is a single factor, its levels are the treatments For the success of the experiment, it is crucial that potentially impOltant factors be identified at the planning stage There are two graphical methods for identifying potential factors First, a flow chart of the pro-cess or system is helpful to see where the factors arise in a multistage process In Figure 1.1, a rough sketch of a paper pulp manufacturing process is given which involves raw materials from suppliers, a chemical process to make a slurry which
is passed through a mechanical process to produce the pulp Involving all the stakeholders is invaluable in capturing an accurate description of the process or system Second, a cause-and-effect diagram can be used to list and organize the potential factors that may impact the response In Figure 1.2, a cause-and-effect diagram is given which lists the factors thought to affect the product quality of an injection molding process Traditionally, the factors are organized under the head-ings: Man, Machine, Measurement, Material, Method, and Environment (Mother Nature for those who like M's) Because of their appearance, cause-and-effect
diagrams are also called fishbone diagrams Different characteristics of the
fac-tors need to be recognized because they can affect the choice of the experimental
design For example, a factor such as furnace temperature is hard to change That
Trang 346 BASIC DESIGN CONCEPTS AND REGRESSION ANALYSIS
Slurry Concentration
t -~I Mechanical
Phase Refiner Plate Gap
Fipre 1.L Flow chart, pulp manufacturing process
MACHINE injection pressure
Trang 35pLANNING AND IMPLEMENTATION OF EXPERIMENTS 7 force Other factors that may be hard or impossible to control are referred to as
noise factors Examples of noise factors include environmental and customer use
conditions (An in-depth discussion of noise factors will be given in Section 11.3.)
Factors may be quantitative and qualitative Quantitative factors like
temper-ature, time, and pressure take values over a continuous range Qualitative factors take on a discrete number of values Examples of qualitative factors include oper-ation mode, supplier, position, line, and so on Of the two types of factors, there
is more freedom in choosing the levels of quantitative factors For example, if temperature (in degrees Celsius) is in the range 100-200°C, one could choose 130°C and 160°C for two levels or 125°C, IS0cC, and 175°C for three levels If only a linear effect is expected, two levels should suffice If curvature is expected, then three or more levels are required In general, the levels of quantitative fac-tors must be chosen far enough apart so that an effect can be detected but not too far so that different physical mechanisms are involved (which would make
it difficult to do statistical modeling and prediction) There is less flexibility in choosing the levels of qualitative factors Suppose there are three testing meth-ods under comparison All three must be included as three levels of the factor
"testing method," unless the investigator is willing to postpone the study of one method so that only two methods are compared in a two-level experiment When there is flexibility in choosing the number of levels, the choice may depend on the availability of experimental plans for the given combination of
factor levels In choosing factors and levels, cost and practical constraints must
be considered If two levels of the factor "material" represent expensive and cheap materials, a negligible effect of material on the response will be welcomed because the cost can be drastically reduced by replacing the expensive material by the cheap alternative Factor levels must be chosen to meet practical constraints
If a factor level combination (e.g., high temperature and long time in an oven) can potentially lead to disastrous results (e.g., burned or overbaked), it should be avoided and a different plan should be chosen
4 Choose Experimental Plan Use the fundamental principles discussed in
Section 1.3 as well as other principles presented throughout the book The choice of the experimental plan is crucial A poor design may capture little information which no analysis can rescue On the other hand, if the experiment
is well planned, the results may be obvious so that no sophisticated analysis is needed
5 Perform the Experiment The use of a planning matrix is recommended
This matrix describes the experimental plan in terms of the actual values or settings of the factors For example, it lists the actual levels such as 50 or 70 psi
if the factor is pressure To avoid confusion and eliminate potential problems
of running the wrong combination of factor levels in a multifactor experiment, each of the treatments, such as temperature at 30cC and pressure at 70 psi, should be put on a separate piece of paper and given to the personnel pelforming the experiment It is also worthwhile to perform a trial run to see if there will
be difficulties in running the experiment, namely, if there are problems with setting the factors and measUling the responses Any deviations from the planned
Trang 368 BASIC DESIGN CONCEPTS AND REGRESSION ANALYSIS
experiment need to be recorded For example, for hard-to-set factors, the actual values should be recorded
6 Analyze the Data An analysis appropriate for the design used to collect the data needs to be carried out This includes model fitting and assessment of the model assumptions through an analysis of residuals Many analysis methods will be presented throughout the book
7 Draw Conclusions and Make Recommendations Based on the data analysis, conclusions are presented which include the important factors and a model for the response in tenns of the important factors Recommended settings or levels for the important factors may also be given The conclusions should refer back to the stated objectives of the experiment A confirmation experiment is worthwhile, for example, to confirm the recommended settings Recommendations for further experimentation in a follow-up experiment may also be given For example, a follow-up experiment is needed if two models explain the experimental data equally well and one must be chosen for optimization
For further discussion on the planning of experiments, see Coleman and gomery (1993), Knowlton and Keppinger (1993), and Barton (1997)
Mont-1.3 FUNDAMENTAL PRINCIPLES: REPLICATION,
RANDOMIZATION, AND BLOCKING
There are three fundamental principles that need to be considered in the design
of an experiment: replication, randomization, and blocking Other principles will be introduced later in the book as they arise
An experimental unit is a generic telm that refers to a basic unit such as material, animal, person, machine, or time period, to which a treatment is applied
By replication, we mean that each treatment is applied to experimental units that are representative of the population of units to which the conclusions of the experiment will apply It enables the estimation of the magnitude of experimental error (i.e., the error variance) against which the differences among treatments are judged Increasing the number of replications, or replicates, decreases the variance of the treatment effect estimates and provides more power for detecting differences in treatments A distinction needs to be made between replicates and repetitions For example, three readings from the same experimental unit are repetitions, while the readings from three separate experimental units are replicates The error variance from the former is less than that from the latter because repeated readings only measure the variation due to errors in reading while the latter also measures the unit-to-unit variation Underestimation of the true error variance can result in the false declaration of an effect as significant The second principle is that of randomization It should be applied to the allocation of units to treatments, the order in which the treatments are applied in perfOlming the experiment, and the order in which the responses are measured
It provides protection against variables that are unknown to the experimenter
Trang 37FUNDAMENTAL PRINCIPLES: REPLICATION, RANDOMIZATION, AND BLOCKJNG 9 but may impact the response It reduces the unwanted influence of subjective judgment in treatment allocation Moreover, randomization ensures validity of the estimate of experimental error and provides a basis for inference in analyzing the experiments For an in-depth discussion on randomization, see Hinkelmann and Kempthorne (1994)
A prominent example of randomization is its use in clinical trials If a cian were free to assign a treatment or control (or a new treatment versus an old treatment) to hislher patients, there might be a tendency to assign the treat-ment to those patients who are sicker and would not benefit from receiving a control This would bias the outcome of the trial as it would create an unbalance between the control and treatment groups A potentially effective treatment like
physi-a new drug mphysi-ay not even show up physi-as promising if it is physi-assigned to physi-a lphysi-arger portion of "sick" patients A random assignment of treatment/control to patients would prevent this from happening Particularly commonplace is the use of the
pro-double-blind trial, in which neither the patient nor the doctor or investigator has access to the information about the actual treatment assignment More on clinical trials can be found in Rosenberger and Lachin (2002)
A group of homogeneous units is refen'ed to as a block Examples of blocks include days, weeks, morning vs afternoon, batches, lots, sets of twins, and pairs
of kidneys For blocking to be effective, the units should be arranged so that the within-block variation is much smaller than the between-block variation By comparing the treatments within the same block, the block effects are eliminated
in the comparison of the treatment effects, thereby making the experiment more efficient For example, there may be a known day effect on the response so that
if all the treatments can be applied within the same day, the day-to-day variation
in randomized block designs, to be discussed in Section 3.2
These three principles are generally applicable to physical experiments but not
to computer experiments because the same input in a computer experiment gives rise to the same output Computer experiments (see Santner et aI., 2003) are not considered in the book, however
A simple example will be used to explain these principles Suppose two boards denoted by A and B are being compared in terms of typing efficiency Six different manuscripts denoted by 1-6 are given to the same typist First the test is arranged in the following sequence:
Because the manuscripts can vary in length and difficulty, each manuscript is treated as a "block" with the two keyboards as two treatments Therefore, the experiment is replicated six times (with six manuscripts) and blocking is used
Trang 3810 BASIC DESIGN CONCEPTS AND REGRESSION ANALYSIS
to compare the two keyboards with the same manuscript The design has a ous flaw, however After typing the manuscript on keyboard A, the typist will
seri-be familiar with the content of the manuscript when he or she is typing the same manuscript on k~yboard B This "learning effect" will unfairly help the performance of keyboard B The observed difference between A and B is the combination of the treatment effects (which measures the intrinsic difference between A and B) and the learning effect For the given test sequence, it is impossible to disentangle the learning effect from the treatment effect Random-ization would help reduce the unwanted influence of the learning effect, which might not have been known to the investigator who planned the study By ran-domizing the typing order for each manuscript, the test sequence may appear as follows:
With four AB's and two BA's in the sequence, it is a better design than the first one A further improvement can be made The design is not balanced because
B benefits from the learning effect in four trials while A only benefits from two trials There is still a residual learning effect not completely eliminated by the second design The learning effect can be completely eliminated by requiring that half of the trials have the order AB and the other half the order BA The actual assignment of AB and BA to the six manuscripts should be done by randomization ll1is method is referred to as balanced randomization Balance is a desirable design property, which will be discussed later
For simplicity of discussion, we have assumed that only one typist was involved in the experiment In a practical situation, such an experiment should involve several typists that are representative of the population of typists so that the conclusions made from the study would apply more generally This and other aspects of the typing experiment will be addressed in the exercises
With these principles in mind, a useful addition to the cause-and-effect diagram
is to indicate how the proposed experimental design addresses each listed factor The following designations are suggested: E for an experimental factor, B for
a factor handled by blocking, 0 for a factor held constant at one value, and R for a factor handled by randomization This designation clearly indicates how the proposed design deals with each of the potentially important factors The designation 0, for "one value," serves to remind the experimenter that the factor
is held constant during the current experiment but may be varied in a future experiment An illustration is given in Figure 1.3 from the injection molding experiment discussed in Section 1.2
Other designations of factors can be considered For example, experimental factors can be further divided into two types (control factors and noise fac-tors), as in the discussion on the choice of factors in Section 1.2 For the implementation of experiments, we may also designate an experimental factor
as "hard-to-change" or "easy-to-change." These designations will be considered later as they arise
Trang 39sIMPLE LINEAR REGRESSION
MACHINE injection pressure (E)
injection speed (E)
nozzle temperature (0)
MATERIAL pre-blend pigmentation (B)
Figure 1.3 Revised cause-and-effect diagram, injection molding experiment
1.4 SIMPLE LINEAR REGRESSION
11
Throughout the book, we will often model experimental data by the general linear model (also called the multiple regression model) Before considering the general linear model in Section 1.6, we present here the simplest case known as the simple linear regression model, which consists of a single covariate We use the following data to illustrate the analysis technique known as simple linear regression
Lea (1965) discussed the relationship between mean annual temperature and
a mortality index for a type of breast cancer in women The data (shown in
Table 1.1), taken from certain regions of Great Britain, Norway, and Sweden, consist of the mean annual temperature (in degrees Fahrenheit) and a mortality index for neoplasms of the female breast
Table 1.1 Breast Cancer Mortality Data
Mortality Index (M): 102.5 104.5 100.4 95.9 87.0 95.0 88.6 89.2
Temperature (T): 51.3 49.9 50.0 49.2 48.5 47.8 47.3 45.1
Mortality Index (M): 78.9 84.6 81.7 72.2 65.1 68.1 67.3 52.5
Temperature (T): 46.3 42.1 44.2 43.5 42.3 40.2 31.8 34.0
Trang 4012 BASIC DESIGN CONCEPTS AND REGRESSION ANALYSIS
Figure 1.4 Scatter plot of temperature versus mortality index, btellSt cancer example
The first step in any regression analysis is to make a scatter plot A scatter plot of mortality index against temperature (Figure 1.4) reveals an increasing linear relationship between the two variables Such a linear relationship between
a response y and a covariate x can be expressed in terms of the following model:
y = Po + PIX + E,
where E is the random part of the model which is assumed to be normally distributed with mean 0 and variance 0'2, that is, E '" N(O, 0'2); because E is normally distributed, so is y with mean E(y) = Po + PIX and Var(y) = 0'2
If N observations are collected in an experiment, the model for them takes the form
y; = Po + PIX; + EI i = 1, , N, (1.1)
where Yi is the ith value of the response and Xi is the corresponding valUe of the covariate
The unknown parameters in the model are the regression coefficients Po and
PI and the error variance 0'2 Thus, the purpose for collecting the data is to estimate and make inferences about these parameters For estimating Po and PI,
the least squares criterion is used; that is, the least squares estimators (LSEs), denoted by Po and PI, respectively, minimize the following quantity:
N
;=1