1 The Role of Statistics and the Data Analysis Process 1 1.1 Three Reasons to Study Statistics 1 1.2 The Nature and Role of Variability 4 1.3 Statistics and the Data Analysis Process 7 1
Trang 2to Statistics
and Data Analysis
Trang 3This page intentionally left blank
Trang 4California Polytechnic State University, San Luis Obispo
Australia • Brazil • Canada • Mexico • Singapore • Spain • United Kingdom • United States
Trang 5Introduction to Statistics and Data Analysis,
Third Edition
Roxy Peck, Chris Olsen, Jay Devore
Acquisitions Editor: Carolyn Crockett
Development Editor: Danielle Derbenti
Assistant Editor: Beth Gershman
Editorial Assistant: Ashley Summers
Technology Project Manager: Colin Blake
Marketing Manager: Joe Rogove
Marketing Assistant: Jennifer Liang
Marketing Communications Manager: Jessica Perry
Project Manager, Editorial Production: Jennifer Risden
Creative Director: Rob Hugel
Art Director: Vernon Boes
Print Buyer: Karen Hunt
Permissions Editor: Isabel Alves Production Service: Newgen–Austin Text Designer: Stuart Paterson Photo Researcher: Gretchen Miller Copy Editor: Nancy Dickson Illustrator: Jade Myers; Newgen–India Cover Designer: Stuart Paterson Cover Image: Paul Chesley/Getty Images Cover Printer: Courier Corporation/Kendallville Compositor: Newgen–India
Printer: Courier Corporation/Kendallville
© 2008, 2005 Duxbury, an imprint of Thomson Brooks/Cole, a
part of The Thomson Corporation Thomson, the Star logo, and
Brooks/Cole are trademarks used herein under license.
ALL RIGHTS RESERVED No part of this work covered by the
copyright hereon may be reproduced or used in any form or by
any means—graphic, electronic, or mechanical, including
photo-copying, recording, taping, Web distribution, information storage
and retrieval systems, or in any other manner—without the
writ-ten permission of the publisher.
Printed in the United States of America
1 2 3 4 5 6 7 11 10 09 08 07
ExamView ® and ExamView Pro ® are registered trademarks
of FSCreations, Inc Windows is a registered trademark of the
Microsoft Corporation used herein under license Macintosh and
Power Macintosh are registered trademarks of Apple Computer,
Inc Used herein under license.
Library of Congress Control Number: 2006933904
For more information about our products, contact us at:
Thomson Learning Academic Resource Center 1-800-423-0563
For permission to use material from this text or product, submit a
request online at http://www.thomsonrights.com.
Any additional questions about permissions can be submitted by
e-mail to thomsonrights@thomson.com.
Trang 6bet I wouldn’t put their names in this book.
Trang 7ROXY PECK is Associate Dean of the
College of Science and Mathematics
and Professor of Statistics at California
Polytechnic State University, San Luis
Obispo Roxy has been on the faculty
at Cal Poly since 1979, serving for six
years as Chair of the Statistics Department before
becoming Associate Dean She received an M.S in
Mathematics and a Ph.D in Applied Statistics from
the University of California, Riverside Roxy is
na-tionally known in the area of statistics education,
and in 2003 she received the American Statistical
Association’s Founder’s Award, recognizing her
con-tributions to K–12 and undergraduate statistics
edu-cation She is a Fellow of the American Statistical
Association and an elected member of the
Interna-tional Statistics Institute Roxy has recently
com-pleted five years as the Chief Reader for the
Ad-vanced Placement Statistics Exam and currently
chairs the American Statistical Association’s Joint
Committee with the National Council of Teachers of
Mathematics on Curriculum in Statistics and
Proba-bility for Grades K–12 In addition to her texts in
in-troductory statistics, Roxy is also co-editor of
Statisti-cal Case Studies: A Collaboration Between Academe
and Industry and a member of the editorial board for
Statistics: A Guide to the Unknown, 4th edition
Out-side the classroom and the office, Roxy likes to travel
and spends her spare time reading mystery novels
She also collects Navajo rugs and heads to New
Mex-ico whenever she can find the time
CHRIS OLSEN has taught statistics
at George Washington High School in
Cedar Rapids, Iowa, for over 25 years
Chris is a past member of the
Ad-vanced Placement Statistics Test
De-velopment Committee and the author
of the Teacher’s Guide for Advanced Placement
Sta-tistics He has been a table leader at the AP Statistics
reading for 6 years and since the summer of 1996 has
been a consultant to the College Board Chris leads
workshops and institutes for AP Statistics teachers
in the United States and internationally Chris was
the Iowa recipient of the Presidential Award for
Ex-cellence in Science and Mathematics Teaching in
1986 He was a regional winner of the IBM
Com-puter Teacher of the Year award in 1988 and receivedthe Siemens Award for Advanced Placement in math-ematics in 1999 Chris is a frequent contributor tothe AP Statistics Electronic Discussion Group and
has reviewed materials for The Mathematics Teacher, the AP Central web site, The American Statistician, and the Journal of the American Statistical Associa-
tion He currently writes a column for Stats
maga-zine Chris graduated from Iowa State Universitywith a major in mathematics and, while acquiringgraduate degrees at the University of Iowa, concen-trated on statistics, computer programming, psycho-metrics, and test development Currently, he divideshis duties between teaching and evaluation; in addi-tion to teaching, he is the assessment facilitator forthe Cedar Rapids, Iowa, Community Schools In hisspare time he enjoys reading and hiking He and hiswife have a daughter, Anna, who is a graduate stu-dent in Civil Engineering at Cal Tech
JAY DEVORE earned his graduate degree in Engineering Sci-ence from the University of California
under-at Berkeley, spent a year under-at the sity of Sheffield in England, and fin-ished his Ph.D in statistics at StanfordUniversity He previously taught at the University ofFlorida and at Oberlin College and has had visitingappointments at Stanford, Harvard, the University
Univer-of Washington, and New York University From 1998
to 2006, Jay served as Chair of the Statistics ment at California Polytechnic State University, SanLuis Obispo The Statistics Department at Cal Polyhas an international reputation for activities in sta-tistics education In addition to this book, Jay haswritten several widely used engineering statisticstexts and is currently working on a book in appliedmathematical statistics He is the recipient of a dis-tinguished teaching award from Cal Poly and is aFellow of the American Statistical Association Inhis spare time, he enjoys reading, cooking and eatinggood food, tennis, and travel to faraway places He isespecially proud of his wife, Carol, a retired elemen-tary school teacher, his daughter Allison, who worksfor the Center for Women and Excellence in Boston,and his daughter Teri, who is finishing a graduateprogram in education at NYU
Trang 81 The Role of Statistics and the Data Analysis Process 1
1.1 Three Reasons to Study Statistics 1
1.2 The Nature and Role of Variability 4
1.3 Statistics and the Data Analysis Process 7
1.4 Types of Data and Some Simple Graphical Displays 12
2 Collecting Data Sensibly 27
2.1 Statistical Studies: Observation and Experimentation 27
2.2 Sampling 32
2.3 Simple Comparative Experiments 42
2.4 More on Experimental Design 51
2.5 More on Observational Studies: Designing Surveys (Optional) 56
2.6 Interpreting and Communicating the Results of
Statistical Analyses 61
Graphing Calculator Explorations 69
3 Graphical Methods for Describing Data 75
3.1 Displaying Categorical Data: Comparative Bar Charts
and Pie Charts 76
3.2 Displaying Numerical Data: Stem-and-Leaf Displays 87
Trang 9viii ■ Contents
3.3 Displaying Numerical Data: Frequency Distributions
and Histograms 97
3.4 Displaying Bivariate Numerical Data 117
3.5 Interpreting and Communicating the Results of
Statistical Analyses 127
Graphing Calculator Explorations 141
4 Numerical Methods for Describing Data 147
4.1 Describing the Center of a Data Set 148
4.2 Describing Variability in a Data Set 159
4.3 Summarizing a Data Set: Boxplots 169
4.4 Interpreting Center and Variability: Chebyshev’s Rule,
the Empirical Rule, and z Scores 176
4.5 Interpreting and Communicating the Results of
Statistical Analyses 186
Graphing Calculator Explorations 195
5 Summarizing Bivariate Data 199
5.1 Correlation 200
5.2 Linear Regression: Fitting a Line to Bivariate Data 210
5.3 Assessing the Fit of a Line 221
5.4 Nonlinear Relationships and Transformations 238
5.5 Logistic Regression (Optional) 255
5.6 Interpreting and Communicating the Results
of Statistical Analyses 264
Graphing Calculator Explorations 272
6 Probability 279
6.1 Chance Experiments and Events 279
6.2 Definition of Probability 288
Trang 106.3 Basic Properties of Probability 295
6.4 Conditional Probability 302
6.6 Some General Probability Rules 323
6.7 Estimating Probabilities Empirically Using Simulation 335
Graphing Calculator Explorations 351
7 Random Variables and Probability Distributions 357
7.2 Probability Distributions for Discrete Random Variables 361
7.3 Probability Distributions for Continuous Random Variables 367
7.4 Mean and Standard Deviation of a Random Variable 372
7.5 Binomial and Geometric Distributions 386
7.6 Normal Distributions 397
7.7 Checking for Normality and Normalizing Transformations 414
7.8 Using the Normal Distribution to Approximate a
Discrete Distribution 425
Graphing Calculator Explorations 434
8 Sampling Variability and Sampling Distributions 445
8.1 Statistics and Sampling Variability 446
8.2 The Sampling Distribution of a Sample Mean 450
8.3 The Sampling Distribution of a Sample Proportion 461
an Advantage in College Admissions? 468
Graphing Calculator Explorations 471
9 Estimation Using a Single Sample 475
9.1 Point Estimation 476
9.2 Large-Sample Confidence Interval for a Population Proportion 482
9.3 Confidence Interval for a Population Mean 495
Trang 11x ■ Contents
9.4 Interpreting and Communicating the Results of
Statistical Analyses 508
Population Proportion 515
Graphing Calculator Explorations 521
10 Hypothesis Testing Using a Single Sample 525
10.1 Hypotheses and Test Procedures 526
10.2 Errors in Hypotheses Testing 531
10.3 Large-Sample Hypothesis Tests for a Population Proportion 537
10.4 Hypotheses Tests for a Population Mean 550
10.5 Power and Probability of Type II Error 562
10.6 Interpreting and Communicating the Results of
Statistical Analyses 571
Graphing Calculator Explorations 580
11 Comparing Two Populations or Treatments 583
11.1 Inferences Concerning the Difference Between Two Population
or Treatment Means Using Independent Samples 583
11.2 Inferences Concerning the Difference Between Two Population
or Treatment Means Using Paired Samples 606
11.3 Large Sample Inferences Concerning a Difference Between Two
Population or Treatment Proportions 619
11.4 Interpreting and Communicating the Results of
Statistical Analyses 629
Graphing Calculator Explorations 641
Trang 1212 The Analysis of Categorical Data and
Goodness-of-Fit Tests 647
12.1 Chi-Square Tests for Univariate Data 647
12.2 Tests for Homogeneity and Independence in a
Two-way Table 660
12.3 Interpreting and Communicating the Results of
Statistical Analyses 677
Graphing Calculator Explorations 685
13 Simple Linear Regression and Correlation:
Inferential Methods 689
13.1 Simple Linear Regression Model 690
13.2 Inferences About the Slope of the Population Regression Line 702
13.4 Inferences Based on the Estimated Regression Line
Graphing Calculator Exploration 746
14 Multiple Regression Analysis 749
14.1 Multiple Regression Models 750
14.2 Fitting a Model and Assessing Its Utility 763
14.3 Inferences Based on an Estimated Model 14-1
14.4 Other Issues in Multiple Regression 14-13
14.5 Interpreting and Communicating the Results of
Statistical Analyses 14-26
Predictors and Sample Size 780
Sections and/or chapter numbers in color can be found at www.thomsonedu.com/statistics/peck
Trang 13Graphing Calculator Exploration 811
16 Nonparametric (Distribution-Free) Statistical Methods 16-1
16.1 Distribution-Free Procedures for Inferences About a Difference
Between Two Population or Treatment Means Using IndependentSamples (Optional) 16-1
16.2 Distribution-Free Procedures for Inferences About a Difference
Between Two Population or Treatment Means Using Paired Samples 16-10
Trang 14In a nutshell, statistics is about understanding the role that variability plays in
draw-ing conclusions based on data Introduction to Statistics and Data Analysis, Third
Edi-tion develops this crucial understanding of variability through its focus on the dataanalysis process
An Organization That Reflects the Data Analysis Process
Students are introduced early to the idea that data analysis is a process that begins withcareful planning, followed by data collection, data description using graphical andnumerical summaries, data analysis, and finally interpretation of results This process
is described in detail in Chapter 1, and the ordering of topics in the first ten chapters
of the book mirrors this process: data collection, then data description, then statisticalinference
The logical order in the data analysis process can be pictured as shown in the lowing figure
fol-Unlike many introductory texts, Introduction to Statistics and Data Analysis,
Third Edition is organized in a manner consistent with the natural order of the dataanalysis process:
Step 1:
Acknowledging Variability—
Collecting Data Sensibly
Step 2:
Describing Variability
in the Data—
Descriptive Statistics
Step 3:
Drawing Conclusions
in a Way That Recognizes Variability in the Data
Trang 15xiv ■ Preface
The Importance of Context and Real Data
Statistics is not about numbers; it is about data—numbers in context It is the contextthat makes a problem meaningful and something worth considering For example, ex-ercises that ask students to compute the mean of 10 numbers or to construct a dotplot
or boxplot of 20 numbers without context are arithmetic and graphing exercises Theybecome statistics problems only when a context gives them meaning and allows for in-terpretation While this makes for a text that may appear “wordy” when compared totraditional mathematics texts, it is a critical and necessary component of a modern sta-tistics text
Examples and exercises with overly simple settings do not allow students to tice interpreting results in authentic situations or give students the experience neces-sary to be able to use statistical methods in real settings We believe that the exercisesand examples are a particular strength of this text, and we invite you to compare theexamples and exercises with those in other introductory statistics texts
prac-Many students are skeptical of the relevance and importance of statistics trived problem situations and artificial data often reinforce this skepticism A strategythat we have employed successfully to motivate students is to present examples andexercises that involve data extracted from journal articles, newspapers, and other pub-lished sources Most examples and exercises in the book are of this nature; they cover
Con-a very wide rCon-ange of disciplines Con-and subject Con-areCon-as These include, but Con-are not limited
to, health and fitness, consumer research, psychology and aging, environmental search, law and criminal justice, and entertainment
re-A Focus on Interpretation and Communication
Most chapters include a section titled “Interpreting and Communicating the Results ofStatistical Analyses.” These sections include advice on how to best communicate theresults of a statistical analysis and also consider how to interpret statistical summaries
Step 1:
Acknowledging Variability—
Collecting Data Sensibly
Step 2:
Describing Variability
in the Data—
Descriptive Statistics
Probability Supports the Connection
Step 3:
Drawing Conclusions
in a Way That Recognizes Variability in the Data
Chapters 1–2 Chapters 3–5 Chapters 6–7 Chapters 8–15
Trang 16found in journals and other published sources A subsection titled “A Word to theWise” reminds readers of things that must be considered in order to ensure that statis-tical methods are employed in reasonable and appropriate ways.
Consistent with Recommendations for the Introductory Statistics Course Endorsed
by the American Statistical Association
In 2005, the American Statistical Association endorsed the report “College Guidelines
in Assessment and Instruction for Statistics Education (GAISE Guidelines),” whichincluded the following six recommendations for the introductory statistics course:
1 Emphasize statistical literacy and develop statistical thinking
2 Use real data
3 Stress conceptual understanding rather than mere knowledge of procedures
4 Foster active learning in the classroom
5 Use technology for developing conceptual understanding and analyzing data
6 Use assessments to improve and evaluate student learning
Introduction to Statistics and Data Analysis, Third Edition is consistent with these
rec-ommendations and supports the GAISE guidelines in the following ways:
1 Emphasize statistical literacy and develop statistical thinking.
Statistical literacy is promoted throughout the text in the many examples and cises that are drawn from the popular press In addition, a focus on the role of vari-ability, consistent use of context, and an emphasis on interpreting and communi-cating results in context work together to help students develop skills in statisticalthinking
exer-2 Use real data.
The examples and exercises from Introduction to Statistics and Data Analysis,
Third Edition are context driven and reference sources that include the popularpress as well as journal articles
3 Stress conceptual understanding rather than mere knowledge of procedures.
Nearly all exercises in Introduction to Statistics and Data Analysis, Third Edition
are multipart and ask students to go beyond just computation They focus on pretation and communication, not just in the chapter sections specifically devoted
inter-to this inter-topic, but throughout the text The examples and explanations are designed
to promote conceptual understanding Hands-on activities in each chapter are alsoconstructed to strengthen conceptual understanding Which brings us to
4 Foster active learning in the classroom.
While this recommendation speaks more to pedagogy and classroom practice,
In-troduction to Statistics and Data Analysis, Third Edition provides 33 hands-on
ac-tivities in the text and additional acac-tivities in the accompanying instructor resourcesthat can be used in class or assigned to be completed outside of class In addition,accompanying online materials allow students to assess their understanding and de-velop a personalized learning plan based on this assessment for each chapter
5 Use technology for developing conceptual understanding and analyzing data.
The computer has brought incredible statistical power to the desktop of every vestigator The wide availability of statistical computer packages such as MINITAB,S-Plus, JMP, and SPSS, and the graphical capabilities of the modern microcom-puter have transformed both the teaching and learning of statistics To highlight therole of the computer in contemporary statistics, we have included sample output
Trang 17in-xvi ■ Preface
throughout the book In addition, numerous exercises contain data that can easily
be analyzed by computer, though our exposition firmly avoids a presupposition thatstudents have access to a particular statistical package Technology manuals forspecific packages, such as MINITAB and SPSS, are available in the online materi-als that accompany this text
The appearance of hand-held calculators with significant statistical and ing capability has also changed statistics instruction in classrooms where access tocomputers is still limited The computer revolution of a previous generation is now
graph-being writ small—or, possibly we should say, smaller—for the youngest generation
of investigators There is not, as we write, anything approaching universal or evenwide agreement about the proper role for the graphing calculator in college statis-tics classes, where access to a computer is more common At the same time, fortens of thousands of students in Advanced Placement Statistics in our high schools,the graphing calculator is the only dependable access to statistical technology.This text allows the instructor to balance the use of computers and calculators
in a manner consistent with his or her philosophy and presents the power of the culator in a series of Graphing Calculator Explorations These are placed at the end
cal-of each chapter, unobtrusive to those instructors whose technology preference isthe computer while still accessible to those instructors and students comfortablewith graphing calculator technology As with computer packages, our expositionavoids assuming the use of a particular calculator and presents the calculator capa-bilities in a generic format; specifically, we do not teach particular keystroke se-quences, believing that the best source for such specific information is the calcula-tor manual For those using a TI graphing calculator, there is a technology manualavailable in the online materials that accompany this text As much as possible, thecalculator explorations are independent of each other, allowing instructors to pickand choose calculator topics that are more relevant to their particular courses
6 Use assessments to improve and evaluate student learning.
Assessment materials in the form of a test bank, quizzes, and chapter exams areavailable in the instructor resources that accompany this text The items in the testbank reflect the data-in-context philosophy of the text’s exercises and examples
Advanced Placement Statistics
We have designed this book with a particular eye toward the syllabus of the AdvancedPlacement Statistics course and the needs of high school teachers and students Con-cerns expressed and questions asked in teacher workshops and on the AP Statistics Elec-tronic Discussion Group have strongly influenced our exposition of certain topics, es-pecially in the area of experimental design and probability We have taken great care toprovide precise definitions and clear examples of concepts that Advanced PlacementStatistics instructors have acknowledged as difficult for their students We have also ex-panded the variety of examples and exercises, recognizing the diverse potential futuresenvisioned by very capable students who have not yet focused on a college major
Topic Coverage
Our book can be used in courses as short as one quarter or as long as one year in ration Particularly in shorter courses, an instructor will need to be selective in decid-ing which topics to include and which to set aside The book divides naturally into fourmajor sections: collecting data and descriptive methods (Chapters 1–5), probabilitymaterial (Chapters 6–8), the basic one- and two-sample inferential techniques (Chap-ters 9–12), and more advanced inferential methodology (Chapters 13–16) We include
Trang 18du-an early chapter (Chapter 5) on descriptive methods for bivariate numerical data Thisearly exposure raises questions and issues that should stimulate student interest in thesubject; it is also advantageous for those teaching courses in which time constraintspreclude covering advanced inferential material However, this chapter can easily bepostponed until the basics of inference have been covered, and then combined withChapter 13 for a unified treatment of regression and correlation.
With the possible exception of Chapter 5, Chapters 1–10 should be covered in der We anticipate that most instructors will then continue with two-sample inference(Chapter 11) and methods for categorical data analysis (Chapter 12), although regres-sion could be covered before either of these topics Optional portions of Chapter 14(multiple regression) and chapter 15 (analysis of variance) and Chapter 16 (nonpara-metric methods) are included in the online materials that accompany this text
or-A Note on Probability
The content of the probability chapters is consistent with the Advanced Placement tistics course description It includes both a traditional treatment of probability andprobability distributions at an introductory level, as well as a section on the use of sim-ulation as a tool for estimating probabilities For those who prefer a briefer and more
Sta-informal treatment of probability, the book Statistics: The Exploration and Analysis of
Data, by Roxy Peck and Jay Devore, may be a more appropriate choice Except forthe treatment of probability and the omission of the Graphing Calculator Explorations,
it parallels the material in this text Please contact your sales rep for more informationabout this alternative and other alternative customized options available to you
New to This Edition
There are a number of changes in the Third Edition, including the following:
from current journals and newspapers are included In addition, more of the
exercises specifically ask students to write (for example, by requiring students toexplain their reasoning, interpret results, and comment on important features of ananalysis)
on-line from the text website are designated by an icon in the text, as are
ex-amples that are further illustrated in the technology manuals (MINITAB, SPSS,etc.) that are available in the online materials that accompany this text
Mont-gomery College, which can be viewed online or downloaded for viewing later.These exercises are designated by an icon in the text
activities These activities can be used as a chapter capstone or can be integrated
at appropriate places as the chapter material is covered in class
in each chapter and develop a personalized learning plan to assist them in
ad-dressing any areas of weakness
Although the order of topics in the text generally mirrors the data collectionprocess with methods of data collection covered first, two graphical displays (dot-plots and bar charts) are covered in Chapter 1 so that these simple graphical analy-sis tools can be used in the conceptual development of experimental design and so
Trang 19xviii ■ Preface
that students have some tools for summarizing the data they collect through pling and experimentation in the exercises, examples, and activities of Chapter 2
those who would like more complete coverage of data analysis techniques for egorical data
such as inference and variable selection methods in multiple regression (Sections14.3 and 14.4) and analysis of variance for randomized block and two-factor de-
signs (Sections 15.3 and 15.4), have been moved to the online materials that
accompany this text.
between two population or treatment means using independent samples (formerlySection 11.4) has been moved to Chapter 16 This chapter, titled “Nonparametric(Distribution-Free) Statistical Methods,” also includes new material on inferencesabout the difference between two population or treatment means using pairedsamples and distribution-free analysis of variance, and is available in the onlinematerials that accompany this text
supple-ments such as a complete solutions manual and a test bank, the following are alsoavailable to instructors:
can be incorporated into classroom presentations and cross-references to sources such as Fathom, Workshop Statistics, and Against All Odds Of partic-ular interest to those teaching Advanced Placement Statistics, the binder alsoincludes additional data analysis questions of the type encountered on the freeresponse portion of the Advanced Placement exam, as well as a collection ofmodel responses
re-■For those who use student response systems in class, a set of “clicker”
assessing student understanding is available
Student Resources
If your text includes a printed access card, you will have instant access to the ing resources referenced throughout your text:
follow-■ThomsonNOW™ (see below for a full description of this powerful study tool)
■Complete step-by-step instructions for MINITAB, Excel, TI-83 Graphing lator, JMP, and SPSS indicated by the icon throughout the text
Calcu-■Data sets formatted for MINITAB, Excel, SPSS, SAS, JMP, TI-83, Fathom, andASCII indicated by ●icon throughout the text
■Applets used in the Activities found in the text
Student Solutions Manual (ISBN 0-495-11876-1) by Mary Mortlock of California
Polytechnic State University, San Luis Obispo
Check your work—and your understanding—with this manual, which providesworked-out solutions to the odd-numbered problems in the text
Trang 20Activities Workbook (0-495-11883-4) by Roxy Peck.
Use this convenient workbook to take notes, record data, and cement your learning bycompleting textbook and bonus activities for each chapter
ThomsonNOW™ Homework (0-495-39230-8)
Save time, learn more, and succeed in the course with this online suite of resources(including an integrated eBook and Personalized Study plans) that give you the choices
and tools you need to study smarter and get the grade Note: If your text did not
in-clude a printed access card for ThomsonNOW, it is available for purchase online at
http://www.thomsonedu.com.
Instructor Resources
Annotated Instructor’s Edition (0-495-11888-5)
The Annotated Instructor’s Edition contains answers for all exercises, as well as an notated table of contents with comments written by Roxy Peck
an-Instructor’s Solutions Manual (0-495-11879-6) by Mary Mortlock of California
Polytechnic State University, San Luis Obispo
This manual contains worked-out solutions to all of the problems in the text
Instructor’s Resource Binder (0-495-11892-3) prepared by Chris Olsen.
Includes transparencies and Microsoft®PowerPoint®slides to make lecture and classpreparation quick and easy New to this edition, we have added some Activities Work-sheets authored by Carol Marchetti of Rochester Institute of Technology
Test Bank (0-495-11880-X) by Josh Tabor of Wilson High School, Peter
Flannagan-Hyde of Phoenix Country Day School, and Chris Olsen
Includes test questions for each section of the book
Activities Workbook (0-495-11883-4) by Roxy Peck.
Students can take notes, record data, and complete activities in this ready-to-use book, which includes activities from the textbook plus additional bonus activities foreach chapter
Enhanced WebAssign (ISBN 0-495-10963-0)
Enhanced WebAssign is the most widely used homework system in higher tion Available for this title, Enhanced WebAssign allows you to assign, collect, grade,and record homework assignments via the web This proven homework system hasbeen enhanced to include links to the textbook sections, video examples, and problem-specific tutorials Enhanced WebAssign is more than a homework system—it is a com-plete learning system for students
educa-ThomsonNOW™ Homework (0-495-39230-8)
ThomsonNOW’s Personalized Study plans allow students to study smarter by nosing their weak areas, and helping them focus on what they need to learn Based onresponses to chapter specific pre-tests, the plans suggest a course of study for students,
Trang 21diag-xx ■ Preface
including many multimedia and interactive exercises to help students better learn thematerial After completing the study plan, they can take a post-test to measure theirprogress and understanding
Create, deliver, and customize tests and study guides (both print and online) in utes with this easy-to-use assessment and tutorial system, which contains all questionsfrom the Test Bank in electronic format
Finger Lakes Community CollegeHolly Ashton
Pikes Peak Community CollegeBarb Barnet
University of Wisconsin at Platteville
Eddie BevilacquaState University of New York College of Environmental Science
& ForestryPiotr BialasBorough of Manhattan Community College
Kelly BlackUnion CollegeGabriel ChandlerConnecticut CollegeAndy ChangYoungstown State UniversityJerry Chen
Suffolk Community CollegeRichard Chilcoat
Wartburg College
Marvin CreechChapman UniversityRon Degges
North Dakota State UniversityHemangini DeshmukhMercyhurst CollegeAnn Evans
University of Massachusetts at Boston
Central Carolina Community CollegeGuangxiong Fang
Daniel Webster CollegeSharon B FingerNicholls State UniversitySteven Garren
James Madison UniversityTyler Haynes
Saginaw Valley State UniversitySonja Hensler
St Petersburg CollegeTrish HutchinsonAngelo State UniversityBessie KirkwoodSweet Briar CollegeJeff Kollath
Oregon State University
Trang 22Christopher LackeRowan UniversityMichael LeitnerLouisiana State UniversityZia Mahmood
College of DuPageArt Mark
Georgoa Military CollegeDavid Mathiason
Rochester Institute of TechnologyBob Mattson
Eureka College
C Mark MillerYork CollegeMegan MockoUniversity of FloridaKane NashimotoJames Madison UniversityHelen Noble
San Diego State UniversityBroderick Oluyede
Georgia Southern UniversityElaine Paris
Mercy CollegeShelly Ray ParsonsAims Community CollegeJudy Pennington-PriceMidway CollegeHazard Community CollegeJackson County High School
Michael I RatliffNorthern Arizona UniversityDavid R Rauth
Duquesne UniversityKevin J ReevesEast Texas Baptist UniversityRobb Sinn
North Georgia College & State University
Greg SliwaBroome Community CollegeAngela Stabley
Portland Community CollegeJeffery D Sykes
Ouachita Baptist UniversityYolande Tra
Rochester Institute of TechnologyNathan Wetzel
University of Wisconsin Stevens Point
Dr Mark WilsonWest Virginia University Institute
of TechnologyYong Yu
Ohio State UniversityToshiyuki YuasaUniversity of Houston
Jim BohanManheim Township High SchoolPat Buchanan
Pennsylvania State UniversityMary Christman
American UniversityIowa State UniversityMark GlickmanBoston University
John ImbrieUniversity of VirginiaPam Martin
Northeast Louisiana UniversityPaul Myers
Woodward AcademyDeanna PaytonOklahoma State University
Trang 23xxii ■ Preface
Michael PhelanChapman UniversityAlan PolanskyNorthern Illinois University
Lawrence D RiesUniversity of Missouri ColumbiaJoe Ward
Health Careers High SchoolAdditionally, we would like to express our thanks and gratitude to all who helped tomake this book possible:
■Carolyn Crockett, our editor and friend, for her unflagging support and ful advice for more than a decade
thought-■Danielle Derbenti, Beth Gershman, and Colin Blake at Thomson Brooks/Cole, forthe development of all of the ancillary materials details and for keeping us ontrack
■Jennifer Risden, our project manager at Thomson Brooks/Cole, and Anne Seitz atHearthside Publishing Services, for artfully managing the myriad of details asso-ciated with the production process
■Nancy Dickson for her careful copyediting
■Brian Kotz for all his hard work producing the video solutions
■Mary Mortlock for her diligence and care in producing the student and instructorsolutions manuals for this book
■Josh Tabor and Peter Flannagan-Hyde for their contributions to the test bank thataccompanies the book
■Beth Chance and Francisco Garcia for producing the applet used in the confidenceinterval activities
■Gary McClelland for producing the applets from Seeing Statistics used in the
re-gression activities
■Bittner Development Group for checking the accuracy of the manuscript
■Rachel Dagdagan, a student at Cal Poly, for her help in the preparation of themanuscript
And, as always, we thank our families, friends, and colleagues for their continuedsupport
Roxy Peck Chris Olsen Jay Devore
Trang 24Context Driven
Applications
Real data examples and exercises
throughout the text are drawn from the
popular press, as well as journal articles.
Focus on Interpreting
and Communicating
Chapter sections on interpreting and
communicating results are designed to
emphasize the importance of being
able to interpret statistical output and
communicate its meaning to
non-statisticians A subsection entitled “A
Word to the Wise” reminds students of
things that must be considered in order
to ensure that statistical methods are
used in reasonable and appropriate
of interest (in this case, students at the university) Numerical measures of center and spread and boxplots help to enlighten us, and they also allow us to communicate to others what we have learned from the data.
■ A Word to the Wise: Cautions and Limitations
When computing or interpreting numerical descriptive measures, you need to keep in mind the following:
1 Measures of center don’t tell all Although measures of center, such as the mean and the median, do give us a sense of what might be considered a typical value for
a variable, this is only one characteristic of a data set Without additional tion about variability and distribution shape, we don’t really know much about the behavior of the variable.
informa-2 Data distributions with different shapes can have the same mean and standard viation For example, consider the following two histograms:
2.1 ▼ The article “Television’s Value to Kids: It’s All in
How They Use It” (Seattle Times, July 6, 2005) described
a study in which researchers analyzed standardized test sults and television viewing habits of 1700 children They found that children who averaged more than two hours of television viewing per day when they were younger than
re-3 tended to score lower on measures of reading ability and short term memory.
a Is the study described an observational study or an experiment?
b Is it reasonable to conclude that watching two or more hours of television is the cause of lower reading scores?
Explain.
E x a m p l e 3 2 2 Education Level and Income— Stay in School!
The time-series plot shown in Figure 3.34 appears on the U.S Census Bureau web site It shows the average earnings of workers by educational level as a proportion of the average earnings of a high school graduate over time For example, we can see from this plot that in 1993 the average earnings for people with bachelor’s degrees was about 1.5 times the average for high school graduates In that same year, the av- erage earnings for those who were not high school graduates was only about 75%
xxiii
Trang 25Peck, Olsen, Devore’s
Introduction to Statistics and Data Analysis, Third Edition
Hands-on Activities in
Every Chapter
Thirty-three hands-on activities
in the text, and additional activities
in the accompanying instructor
re-sources, can be used to
encour-age active learning inside or
outside the classroom.
Graphing Calculator Explorations
Found at the end of most chapters, these explorations allow students to actively experience technology and promote statistical thinking.
E x p l o r a t i o n 3.3 Scaling the Histogram
When we constructed a histogram in the previous Exploration there were some bers that we temporarily ignored in the view screen We would like to return to those numbers now because they can seriously affect the look of a histogram When we left the histogram the numbers in our view window were set as shown in Figure 3.43.
num-These settings place the view window over the calculator’s Cartesian system for fective viewing of the histogram from the data of Example 3.15.
ef-We would now like to experiment a bit with the “Xscale.” In all statistical graphs produced by the calculator the Xscale and Yscale choices will control the placement
of the little “tick” marks on the x and y axis In Exploration 3.2, the XScale and YScale were set at 5 and 1, respectively The little tick marks on the x-axis were at multiples
of 5 (Because of the data, the x-axis tick marks were at multiples of 5 and the y-axis
didn’t appear.) Change the Xscale value to 2 and redraw the histogram You should see
a graph similar to Figure 3.44 The y-axis tick marks now appear at multiples of 2,
Note that changing the Xscale has altered not only the tick marks but also the class intervals for the histogram The choice of class intervals can significantly change the look and feel of the histogram The choice of Xscale can affect judgments about the shape of the histogram Because of this possibility it is wise to look at a histogram with varying choices of the Xscale value If the shape appears very similar for differ- ent choices of Xscale, you can interpret and describe the shape with more confidence.
However, if different Xscale choices alter the look of the histogram you should ably be more tentative.
prob-F i g u r e 3 4 3
F i g u r e 3 4 4
A c t i v i t y 2.4 Video Games and Pain Management
Background: Video games have been used for pain agement by doctors and therapists who believe that the attention required to play a video game can distract the player and thereby decrease the sensation of pain The pa-
man-per “Video Games and Health” (British Medical Journal
[2005]:122–123) states:
“However, there has been no long term follow-up and
no robust randomized controlled trials of such ventions Whether patients eventually tire of such games is also unclear Furthermore, it is not known whether any distracting effect depends simply on con- centrating on an interactive task or whether the con- tent of games is also an important factor as there have been no controlled trials comparing video games with other distracters Further research should examine factors within games such as novelty, users’ prefer- ences, and relative levels of challenge and should
inter-compare video games with other potentially ing activities.”
distract-1 Working with a partner, select one of the areas of tential research suggested in the passage from the paper and formulate a specific question that could be addressed
po-by performing an experiment.
2 Propose an experiment that would provide data to dress the question from Step 1 Be specific about how subjects might be selected, what the experimental condi- tions (treatments) would be, and what response would be measured.
ad-3 At the end of Section 2.3 there are 10 questions that can be used to evaluate an experimental design Answer these 10 questions for the design proposed in Step 2.
4 After evaluating your proposed design, are there any changes you would like to make to your design? Explain.
Trang 26Develop Conceptual
Understanding
Applets Allow Students to
See the Concepts
Within the Activities, applets are
used to illustrate and promote a
deeper understanding of the key
statistical concepts.
And Analyze Data
Real Data Sets
Real data sets promote statistical analysis, as well as technology use They are formatted for MINITAB, Excel, SPSS, SAS, JMP, TI-83, and ASCII and are indicated by the icon throughout the text.
Continue generating intervals until you have seen
at least 1000 intervals, and then answer the following question:
a How does the proportion of intervals constructed that contain m 100 compare to the stated confidence level of
3.22 ● Medicare’s new medical plans offer a wide range
of variations and choices for seniors when picking a drug
plan (San Luis Obispo Tribune, November 25, 2005) The
monthly cost for a stand-alone drug plan varies from plan
to plan and from state to state The accompanying table
gives the premium for the plan with the lowest cost for
Oklahoma 10.07
Pennsylvania 10.14 Rhode Island 7.32 South Carolina 16.57 South Dakota 1.87 Tennessee 14.08
■ E x e r c i s e s 3.22–3.34
27.0 would then fall in the class 27.0 to 27.5.
E x a m p l e 3 1 4 Enrollments at Public Universities
● States differ widely in the percentage of college students who are enrolled in lic institutions The National Center for Education Statistics provided the accompa- nying data on this percentage for the 50 U.S states for fall 2002.
pub-Percentage of College Students Enrolled in Public Institutions
is reasonable to start the first class interval at 40 and let each interval have a width
E x a m p l e 1 9 Revisiting Motorcycle Helmets
Example 1.8 used data on helmet use from a sample of 1700 motorcyclists to struct a frequency distribution (Table 1.1) Figure 1.5 shows the bar chart corre- sponding to this frequency distribution.
con-500 600 700 800 900 Frequency
Complete online step-by-step instructions for MINITAB, Excel, TI-83 Graphing Calculator, JMP, and SPSS are indicated
by the icon throughout the text.
▲
Page 101
xxv
Trang 27Evaluate as You Teach Using Clickers
Using clicker content authored by Roxy Peck, evaluate your students’ understanding immediately — in class — after teaching a concept Whether it’s a quick quiz, a poll to
be used for in-class data, or just checking in
to see if it is time to move on, our quality, tested content creates truly interactive classrooms with students’ responses shaping the lecture as you teach.
Video Solutions Motivate Student Understanding
More than 90 exercises will have video solutions, presented by Brian Kotz of Montgomery College, which can be viewed online or downloaded for later viewing These exercises will be designated by the in the text.
Get Feedback from Roxy
Peck on What You Need
to Learn
ThomsonNOW allows students to
assess their understanding and develop
a personalized learning plan based on
this assessment for each chapter
Pre- and post-tests include feedback
authored by Roxy Peck.
3.25● USA Today (July 2, 2001) gave the following
in-formation regarding cell phone use for men and women:
a Construct a relative frequency histogram for average
number of minutes used per month for men How would you describe the shape of this histogram?
b Construct a relative frequency histogram for average
number of minutes used per month for women Is the tribution for average number of minutes used per month similar for men and women? Explain.
dis-c What proportion of men average less than 400 minutes
per month?
d Estimate the proportion of men that average less than
500 minutes per month.
Peck, Olsen, Devore’s
Introduction to Statistics and Data Analysis, Third Edition
xxvi
Trang 28P R I N T Annotated Instructor’s Edition
■ Step-by-step instructions for MINITAB, Excel, TI-83
Graphing Calculator, JMP, and SPSS.
■ Data sets formatted for MINITAB, Excel, SPSS,
SAS, JMP, TI-83, and ASCII.
Trang 29The Role of Statistics
and the Data Analysis
Process
Improve your understanding and save time! Visit http://www.thomsonedu.com/login where you will find:
■ Step-by-step instructions for MINITAB, Excel, TI-83, SPSS,
and JMP
■ Video solutions to selected exercises
■ Data sets available for selected examples and exercises
■ Exam-prep pre-tests that build a Personalized Learning Plan based on your results so that you know exactly what to study
■ Help from a live statistics tutor 24 hours a day
We encounter data and conclusions based on data every day Statistics is the
sci-entific discipline that provides methods to help us make sense of data Some peopleare suspicious of conclusions based on statistical analyses Extreme skeptics, usuallyspeaking out of ignorance, characterize the discipline as a subcategory of lying—something used for deception rather than for positive ends However, we believe thatstatistical methods, used intelligently, offer a set of powerful tools for gaining insightinto the world around us Statistical methods are used in business, medicine, agricul-ture, social sciences, natural sciences, and applied sciences, such as engineering Thewidespread use of statistical analyses in diverse fields has led to increased recognitionthat statistical literacy—a familiarity with the goals and methods of statistics—should
be a basic component of a well-rounded educational program
The field of statistics teaches us how to make intelligent judgments and informeddecisions in the presence of uncertainty and variation In this chapter, we consider thenature and role of variability in statistical settings, introduce some basic terminology,and look at some simple graphical displays for summarizing data
Because statistical methods are used to organize, summarize, and draw conclusionsfrom data, a familiarity with statistical techniques and statistical literacy is vital in to-day’s society Everyone needs to have a basic understanding of statistics, and many
Trang 30college majors require at least one course in statistics There are three important sons why statistical literacy is important: (1) to be informed, (2) to understand issuesand be able to make sound decisions based on data, and (3) to be able to evaluate de-cisions that affect your life Let’s explore each reason in detail.
rea-■ The First Reason: Being Informed
How do we decide whether claims based on numerical information are reasonable?
We are bombarded daily with numerical information in news, in advertisements, andeven in conversation For example, here are a few of the items employing statisticalmethods that were part of just two weeks’ news
■The increasing popularity of online shopping has many consumers using Internetaccess at work to browse and shop online In fact, the Monday after Thanksgivinghas been nicknamed “Cyber Monday” because of the large increase in online pur-chases that occurs on that day Data from a large-scale survey conducted in earlyNovember, 2005, by a market research firm was used to compute estimates of thepercent of men and women who shop online while at work The resulting esti-mates probably won’t make most employers happy— 42% of the men and 32% of
the women in the sample were shopping online at work! (Detroit Free Press and
San Luis Obispo Tribune, November 26, 2005)
■A story in the New York Times titled “Students Ace State Tests, but Earn D’s From
U.S.” investigated discrepancies between state and federal standardized test sults When researchers compared state test results to the most recent results onthe National Assessment of Educational Progress (NAEP), they found that largedifferences were common For example, one state reported 89% of fourth graderswere proficient in reading based on the state test, while only 18% of fourth graders
re-in that state were considered proficient re-in readre-ing on the federal test! An
explana-tion of these large discrepancies and potential consequences was discussed (New
York Times, November 26, 2005)
■Can dogs help patients with heart failure by reducing stress and anxiety? One ofthe first scientific studies of the effect of therapeutic dogs found that a measure ofanxiety decreased by 24% for heart patients visited by a volunteer and dog, butonly by 10% for patients visited by just the volunteer Decreases were also noted
in measures of stress and heart and lung pressure, leading researchers to conclude
that the use of therapeutic dogs is beneficial in the treatment of heart patients (San
Luis Obispo Tribune, November 16, 2005)
■Late in 2005, those eligible for Medicare had to decide which, if any, of the manycomplex new prescription medication plans was right for them To assist with thisdecision, a program called PlanFinder that compares available options was made
available online But are seniors online? Based on a survey conducted by the Los
Angeles Times, it was estimated that the percentage of senior citizens that go
on-line is only between 23% and 30%, causing concern over whether providing only
an online comparison is an effective way to assist seniors with this important
de-cision (Los Angeles Times, November 27, 2005)
■Are kids ruder today than in the past? An article titled “Kids Gone Wild” marized data from a survey conducted by the Associated Press Nearly 70% ofthose who participated in the survey said that people were ruder now than 20 yearsago, with kids being the biggest offenders As evidence that this is a serious prob-lem, the author of the article also referenced a 2004 study conducted by PublicAgenda, a public opinion research group That study indicated that more than
Trang 31sum-one third of teachers had either seriously considered leaving teaching or knew a
colleague who left because of intolerable student behavior (New York Times,
November 27, 2005)
■When people take a vacation, do they really leave work behind? Data from a pollconducted by Travelocity led to the following estimates: Approximately 40% oftravelers check work email while on vacation, about 33% take cell phones on va-cation in order to stay connected with work, and about 25% bring a laptop com-puter on vacation The travel industry is paying attention—hotels, resorts, andeven cruise ships are now making it easier for “vacationers” to stay connected to
work (San Luis Obispo Tribune, December 1, 2005)
■How common is domestic violence? Based on interviews with 24,000 women in
10 different countries, a study conducted by the World Health Organization foundthat the percentage of women who have been abused by a partner varied widely—from 15% of women in Japan to 71% of women in Ethiopia Even though the do-mestic violence rate differed dramatically from country to country, in all of thecountries studied women who were victims of domestic violence were abouttwice as likely as other women to be in poor health, even long after the violence
had stopped (San Francisco Chronicle, November 25, 2005)
■Does it matter how long children are bottle-fed? Based on a study of 2121 dren between the ages of 1 and 4, researchers at the Medical College of Wiscon-sin concluded that there was an association between iron deficiency and the length
chil-of time that a child is bottle-fed They found that children who were bottle-fed between the ages of 2 and 4 were three times more likely to be iron deficient than
those who stopped by the time they were 1 year old (Milwaukee Journal Sentinel and San Luis Obispo Tribune, November 26, 2005)
■Parental involvement in schools is often regarded as an important factor in studentachievement However, data from a study of low-income public schools in Cali-fornia led researchers to conclude that other factors, such as prioritizing studentachievement, encouraging teacher collaboration and professional development,and using assessment data to improve instruction, had a much greater impact on
the schools’ Academic Performance Index (Washington Post and San Francisco
Chronicle, November 26, 2005)
To be an informed consumer of reports such as those described above, you must beable to do the following:
1 Extract information from tables, charts, and graphs
2 Follow numerical arguments
3 Understand the basics of how data should be gathered, summarized, and analyzed
to draw statistical conclusions
Your statistics course will help prepare you to perform these tasks
Throughout your personal and professional life, you will need to understand cal information and make informed decisions using this information To make thesedecisions, you must be able to do the following:
statisti-1 Decide whether existing information is adequate or whether additional information
is required
2 If necessary, collect more information in a reasonable and thoughtful way
3 Summarize the available data in a useful and informative manner
1.1 ■ Three Reasons to Study Statistics 3
Trang 324 Analyze the available data.
5 Draw conclusions, make decisions, and assess the risk of an incorrect decision.People informally use these steps to make everyday decisions Should you go out for
a sport that involves the risk of injury? Will your college club do better by trying toraise funds with a benefit concert or with a direct appeal for donations? If you choose
a particular major, what are your chances of finding a job when you graduate? Howshould you select a graduate program based on guidebook ratings that include infor-mation on percentage of applicants accepted, time to obtain a degree, and so on? Thestudy of statistics formalizes the process of making decisions based on data and pro-vides the tools for accomplishing the steps listed
While you will need to make informed decisions based on data, it is also the case thatother people will use statistical methods to make decisions that affect you as an indi-vidual An understanding of statistical techniques will allow you to question and eval-uate decisions that affect your well-being Some examples are:
■Many companies now require drug screening as a condition of employment Withthese screening tests there is a risk of a false-positive reading (incorrectly indicat-ing drug use) or a false-negative reading (failure to detect drug use) What are theconsequences of a false result? Given the consequences, is the risk of a false re-sult acceptable?
■Medical researchers use statistical methods to make recommendations regardingthe choice between surgical and nonsurgical treatment of such diseases as coro-nary heart disease and cancer How do they weigh the risks and benefits to reachsuch a recommendation?
■University financial aid offices survey students on the cost of going to school andcollect data on family income, savings, and expenses The resulting data are used
to set criteria for deciding who receives financial aid Are the estimates they useaccurate?
■Insurance companies use statistical techniques to set auto insurance rates, though some states restrict the use of these techniques Data suggest that youngdrivers have more accidents than older ones Should laws or regulations limit howmuch more young drivers pay for insurance? What about the common practice ofcharging higher rates for people who live in urban areas?
al-An understanding of elementary statistical methods can help you to evaluatewhether important decisions such as the ones just mentioned are being made in a rea-sonable way
We hope that this textbook will help you to understand the logic behind statisticalreasoning, prepare you to apply statistical methods appropriately, and enable you torecognize when statistical arguments are faulty
Statistics is a science whose focus is on collecting, analyzing, and drawing conclusionsfrom data If we lived in a world where all measurements were identical for every in-dividual, all three of these tasks would be simple Imagine a population consisting of
Trang 33all students at a particular university Suppose that every student took the same
num-ber of units, spent exactly the same amount of money on textbooks this semester, andfavored increasing student fees to support expanding library services For this popula-
tion, there is no variability in the number of units, amount spent on books, or student
opinion on the fee increase A researcher studying a sample from this population todraw conclusions about these three variables would have a particularly easy task Itwould not matter how many students the researcher included in the sample or how thesampled students were selected In fact, the researcher could collect information onnumber of units, amount spent on books, and opinion on the fee increase by just stop-ping the next student who happened to walk by the library Because there is no vari-ability in the population, this one individual would provide complete and accurate in-formation about the population, and the researcher could draw conclusions based onthe sample with no risk of error
The situation just described is obviously unrealistic Populations with no ity are exceedingly rare, and they are of little statistical interest because they present
variabil-no challenge! In fact, variability is almost universal It is variability that makes life(and the life of a statistician, in particular) interesting We need to understand vari-ability to be able to collect, analyze, and draw conclusions from data in a sensible way.One of the primary uses of descriptive statistical methods is to increase our under-standing of the nature of variability in a population
Examples 1.1 and 1.2 illustrate how an understanding of variability is necessary
to draw conclusions based on data
E x a m p l e 1 1 If the Shoe Fits
The graphs in Figure 1.1 are examples of a type of graph called a histogram (Theconstruction and interpretation of such graphs is discussed in Chapter 3.) Figure1.1(a) shows the distribution of the heights of female basketball players who played
at a particular university between 1990 and 1998 The height of each bar in the
1.2 ■ The Nature and Role of Variability 5
74 10
20
0 60
58 62 64 66 68 70 72 76 78
Height Frequency
(b) 74
(a)
F i g u r e 1 1 Histograms of heights (in inches) of female athletes: (a) basketball players; (b) gymnasts
Trang 34graph indicates how many players’ heights were in the corresponding interval Forexample, 40 basketball players had heights between 72 in and 74 in., whereas only
2 players had heights between 66 in and 68 in Figure 1.1(b) shows the distribution
of heights for members of the women’s gymnastics team over the same period Bothhistograms are based on the heights of 100 women
The first histogram shows that the heights of female basketball players varied,with most heights falling between 68 in and 76 in In the second histogram we seethat the heights of female gymnasts also varied, with most heights in the range of
60 in to 72 in It is also clear that there is more variation in the heights of the nasts than in the heights of the basketball players, because the gymnast histogramspreads out more about its center than does the basketball histogram
gym-Now suppose that a tall woman (5 ft 11 in.) tells you she is looking for her sisterwho is practicing with her team at the gym Would you direct her to where the bas-ketball team is practicing or to where the gymnastics team is practicing? What rea-soning would you use to decide? If you found a pair of size 6 shoes left in the lockerroom, would you first try to return them by checking with members of the basketballteam or the gymnastics team?
You probably answered that you would send the woman looking for her sister
to the basketball practice and that you would try to return the shoes to a gymnasticsteam member To reach these conclusions, you informally used statistical reasoningthat combined your own knowledge of the relationship between heights of siblingsand between shoe size and height with the information about the distributions ofheights presented in Figure 1.1 You might have reasoned that heights of siblingstend to be similar and that a height as great as 5 ft 11 in., although not impossible,would be unusual for a gymnast On the other hand, a height as tall as 5 ft 11 in.would be a common occurrence for a basketball player Similarly, you might havereasoned that tall people tend to have bigger feet and that short people tend to havesmaller feet The shoes found were a small size, so it is more likely that they belong
to a gymnast than to a basketball player, because small heights and small feet areusual for gymnasts and unusual for basketball players
■
E x a m p l e 1 2 Monitoring Water Quality
As part of its regular water quality monitoring efforts, an environmental controlboard selects five water specimens from a particular well each day The concentra-tion of contaminants in parts per million (ppm) is measured for each of the fivespecimens, and then the average of the five measurements is calculated The his-togram in Figure 1.2 summarizes the average contamination values for 200 days.Now suppose that a chemical spill has occurred at a manufacturing plant 1 milefrom the well It is not known whether a spill of this nature would contaminategroundwater in the area of the spill and, if so, whether a spill this distance from thewell would affect the quality of well water
One month after the spill, five water specimens are collected from the well, andthe average contamination is 15.5 ppm Considering the variation before the spill,would you take this as convincing evidence that the well water was affected by thespill? What if the calculated average was 17.4 ppm? 22.0 ppm? How is your reason-ing related to the graph in Figure 1.2?
Trang 35Before the spill, the average contaminant concentration varied from day to day.
An average of 15.5 ppm would not have been an unusual value, so seeing an average
of 15.5 ppm after the spill isn’t necessarily an indication that contamination has creased On the other hand, an average as large as 17.4 ppm is less common, and anaverage as large as 22.0 ppm is not at all typical of the prespill values In this case,
in-we would probably conclude that the in-well contamination level has increased
■
In these two examples, reaching a conclusion required an understanding of ability Understanding variability allows us to distinguish between usual and unusualvalues The ability to recognize unusual values in the presence of variability is the key
vari-to most statistical procedures and is also what enables us vari-to quantify the chance of ing incorrect when a conclusion is based on sample data These concepts will be de-veloped further in subsequent chapters
be-
Data and conclusions based on data appear regularly in a variety of settings: pers, television and radio advertisements, magazines, and professional publications Inbusiness, industry, and government, informed decisions are often data driven Statisti-cal methods, used appropriately, allow us to draw reliable conclusions based on data.Once data have been collected or once an appropriate data source has been iden-tified, the next step in the data analysis process usually involves organizing and sum-marizing the information Tables, graphs, and numerical summaries allow increasedunderstanding and provide an effective way to present data Methods for organizing
newspa-and summarizing data make up the branch of statistics called descriptive statistics.
After the data have been summarized, we often wish to draw conclusions or makedecisions based on the data This usually involves generalizing from a small group ofindividuals or objects that we have studied to a much larger group
For example, the admissions director at a large university might be interested inlearning why some applicants who were accepted for the fall 2006 term failed to enroll
1.3 ■ Statistics and the Data Analysis Process 7
18 10
40
20 30
0 11
Trang 36at the university The population of interest to the director consists of all accepted plicants who did not enroll in the fall 2006 term Because this population is large and
ap-it may be difficult to contact all the individuals, the director might decide to collectdata from only 300 selected students These 300 students constitute a sample
The second major branch of statistics, inferential statistics, involves generalizing
from a sample to the population from which it was selected When we generalize in thisway, we run the risk of an incorrect conclusion, because a conclusion about the popula-tion is based on incomplete information An important aspect in the development of in-ferential techniques involves quantifying the chance of an incorrect conclusion
■ The Data Analysis Process
Statistics involves the collection and analysis of data Both tasks are critical Raw datawithout analysis are of little value, and even a sophisticated analysis cannot extractmeaningful information from data that were not collected in a sensible way
■ Planning and Conducting a Statistical Study Scientific studies are undertaken toanswer questions about our world Is a new flu vaccine effective in preventing illness?
Is the use of bicycle helmets on the rise? Are injuries that result from bicycle accidentsless severe for riders who wear helmets than for those who do not? How many creditcards do college students have? Do engineering students pay more for textbooks than
do psychology students? Data collection and analysis allow researchers to answersuch questions
The data analysis process can be viewed as a sequence of steps that lead fromplanning to data collection to informed conclusions based on the resulting data Theprocess can be organized into the following six steps:
1 Understanding the nature of the problem Effective data analysis requires an
un-derstanding of the research problem We must know the goal of the research andwhat questions we hope to answer It is important to have a clear direction beforegathering data to lessen the chance of being unable to answer the questions of in-terest using the data collected
2 Deciding what to measure and how to measure it The next step in the process is
deciding what information is needed to answer the questions of interest In some
D E F I N I T I O N
Descriptive statistics is the branch of statistics that includes methods for
organizing and summarizing data Inferential statistics is the branch of
sta-tistics that involves generalizing from a sample to the population from which
it was selected and assessing the reliability of such generalizations
D E F I N I T I O N
The entire collection of individuals or objects about which information is
desired is called the population of interest A sample is a subset of the
pop-ulation, selected for study in some prescribed manner
Trang 37cases, the choice is obvious (e.g., in a study of the relationship between the weight
of a Division I football player and position played, you would need to collect data onplayer weight and position), but in other cases the choice of information is not asstraightforward (e.g., in a study of the relationship between preferred learning styleand intelligence, how would you define learning style and measure it and what mea-sure of intelligence would you use?) It is important to carefully define the variables
to be studied and to develop appropriate methods for determining their values
3 Data collection The data collection step is crucial The researcher must first decide
whether an existing data source is adequate or whether new data must be collected.Even if a decision is made to use existing data, it is important to understand how thedata were collected and for what purpose, so that any resulting limitations are alsofully understood and judged to be acceptable If new data are to be collected, a care-ful plan must be developed, because the type of analysis that is appropriate and thesubsequent conclusions that can be drawn depend on how the data are collected
4 Data summarization and preliminary analysis After the data are collected, the
next step usually involves a preliminary analysis that includes summarizing thedata graphically and numerically This initial analysis provides insight into impor-tant characteristics of the data and can provide guidance in selecting appropriatemethods for further analysis
5 Formal data analysis The data analysis step requires the researcher to select and
apply the appropriate inferential statistical methods Much of this textbook is voted to methods that can be used to carry out this step
de-6 Interpretation of results Several questions should be addressed in this final
step—for example, What conclusions can be drawn from the analysis? How do theresults of the analysis inform us about the stated research problem or question? andHow can our results guide future research? The interpretation step often leads tothe formulation of new research questions, which, in turn, leads back to the firststep In this way, good data analysis is often an iterative process
Example 1.3 illustrates the steps in the data analysis process
E x a m p l e 1 3 A Proposed New Treatment for Alzheimer’s Disease
The article “Brain Shunt Tested to Treat Alzheimer’s” (San Francisco Chronicle,
October 23, 2002) summarizes the findings of a study that appeared in the journal
Neurology Doctors at Stanford Medical Center were interested in determining
whether a new surgical approach to treating Alzheimer’s disease results in improvedmemory functioning The surgical procedure involves implanting a thin tube, called
a shunt, which is designed to drain toxins from the fluid-filled space that cushionsthe brain Eleven patients had shunts implanted and were followed for a year, receiv-ing quarterly tests of memory function Another sample of Alzheimer’s patients wasused as a comparison group Those in the comparison group received the standardcare for Alzheimer’s disease After analyzing the data from this study, the investiga-tors concluded that the “results suggested the treated patients essentially held theirown in the cognitive tests while the patients in the control group steadily declined.However, the study was too small to produce conclusive statistical evidence.” Based
on these results, a much larger 18-month study was planned That study was to clude 256 patients at 25 medical centers around the country
in-This study illustrates the nature of the data analysis process A clearly definedresearch question and an appropriate choice of how to measure the variable of
1.3 ■ Statistics and the Data Analysis Process 9
Trang 38interest (the cognitive tests used to measure memory function) preceded the data lection Assuming that a reasonable method was used to collect the data (we will seehow this can be evaluated in Chapter 2) and that appropriate methods of analysiswere employed, the investigators reached the conclusion that the surgical procedureshowed promise However, they recognized the limitations of the study, especiallythose resulting from the small number of patients in the group that received surgicaltreatment, which in turn led to the design of a larger, more sophisticated study
col-As is often the case, the data analysis cycle led to further research, and the processbegan anew
■
■ Evaluating a Research Study The six data analysis steps can also be used as aguide for evaluating published research studies The following questions should be ad-dressed as part of a study evaluation:
■What were the researchers trying to learn? What questions motivated their research?
■Was relevant information collected? Were the right things measured?
■Were the data collected in a sensible way?
■Were the data summarized in an appropriate way?
■Was an appropriate method of analysis used, given the type of data and how thedata were collected?
■Are the conclusions drawn by the researchers supported by the data analysis?Example 1.4 illustrates how these questions can guide an evaluation of a research study
E x a m p l e 1 4 Spray Away the Flu
The newspaper article “Spray Away Flu” (Omaha World-Herald, June 8, 1998)
re-ported on a study of the effectiveness of a new flu vaccine that is administered bynasal spray rather than by injection The article states that the “researchers gave thespray to 1070 healthy children, 15 months to 6 years old, before the flu season twowinters ago One percent developed confirmed influenza, compared with 18 percent
of the 532 children who received a placebo And only one vaccinated child oped an ear infection after coming down with influenza Typically 30 percent to
devel-40 percent of children with influenza later develop an ear infection.” The researchersconcluded that the nasal flu vaccine was effective in reducing the incidence of fluand also in reducing the number of children with flu who subsequently develop earinfections
The researchers here were trying to find out whether the nasal flu vaccine waseffective in reducing the number of flu cases and in reducing the number of ear in-fections in children who did get the flu They recorded whether a child received thenasal vaccine or a placebo (A placebo is a treatment that is identical in appearance
to the treatment of interest but contains no active ingredients.) Whether or not thechild developed the flu and a subsequent ear infection was also noted These are ap-propriate determinations to make in order to answer the research question of interest
We typically cannot tell much about the data collection process from a newspaperarticle As we will see in Section 2.3, to fully evaluate this study, we would alsowant to know how the participating children were selected, how it was determined
Trang 39■ E x e r c i s e s 1.1–1.9
1.1 Give a brief definition of the terms descriptive
statis-tics and inferential statisstatis-tics.
1.2 Give a brief definition of the terms population and
sample.
1.3 Data from a poll conducted by Travelocity led to the
following estimates: Approximately 40% of travelers
check work email while on vacation, about 33% take cell
phones on vacation in order to stay connected with work,
and about 25% bring a laptop computer on vacation
(San Luis Obispo Tribune, December 1, 2005) Are the
given percentages population values or were they
com-puted from a sample?
1.4 Based on a study of 2121 children between the ages
of one and four, researchers at the Medical College of
Wisconsin concluded that there was an association
be-tween iron deficiency and the length of time that a child is
bottle-fed (Milwaukee Journal Sentinel, November 26,
2005) Describe the sample and the population of interest
for this study
1.5 The student senate at a university with 15,000
stu-dents is interested in the proportion of stustu-dents who favor
a change in the grading system to allow for plus and
mi-nus grades (e.g., B, B, B, rather than just B) Two
hundred students are interviewed to determine their
atti-tude toward this proposed change What is the population
of interest? What group of students constitutes the sample
in this problem?
1.6 The supervisors of a rural county are interested in theproportion of property owners who support the construc-tion of a sewer system Because it is too costly to contactall 7000 property owners, a survey of 500 owners (se-lected at random) is undertaken Describe the populationand sample for this problem
1.7 ▼Representatives of the insurance industry wished toinvestigate the monetary loss resulting from earthquakedamage to single-family dwellings in Northridge, Califor-nia, in January 1994 From the set of all single-familyhomes in Northridge, 100 homes were selected for inspection Describe the population and sample for thisproblem
1.8 A consumer group conducts crash tests of new modelcars To determine the severity of damage to 2006 Mazda6s resulting from a 10-mph crash into a concrete wall, theresearch group tests six cars of this type and assesses theamount of damage Describe the population and samplefor this problem
1.9 A building contractor has a chance to buy an odd
lot of 5000 used bricks at an auction She is interested
in determining the proportion of bricks in the lot that arecracked and therefore unusable for her current project, butshe does not have enough time to inspect all 5000 bricks
Instead, she checks 100 bricks to determine whether each
is cracked Describe the population and sample for thisproblem
1.3 ■ Statistics and the Data Analysis Process 11
that a particular child received the vaccine or the placebo, and how the subsequentdiagnoses of flu and ear infection were made
We will also have to delay discussion of the data analysis and the ness of the conclusions because we do not yet have the necessary tools to evaluatethese aspects of the study
appropriate-■
Many other interesting examples of statistical studies can be found in Statistics: A
Guide to the Unknown and in Forty Studies That Changed Psychology: Exploration into the History of Psychological Research (the complete references for these two
books can be found in the back of the book)
Bold exercises answered in back ● Data set available online but not required ▼ Video solution available
Trang 40Every discipline has its own particular way of using common words, and statistics is
no exception You will recognize some of the terminology from previous math and ence courses, but much of the language of statistics will be new to you
sci-■ Describing Data
The individuals or objects in any particular population typically possess many acteristics that might be studied Consider a group of students currently enrolled in astatistics course One characteristic of the students in the population is the brand ofcalculator owned (Casio, Hewlett-Packard, Sharp, Texas Instruments, and so on) An-other characteristic is the number of textbooks purchased that semester, and yet an-
char-other is the distance from the university to each student’s permanent residence A
vari-able is any characteristic whose value may change from one individual or object to
another For example, calculator brand is a variable, and so are number of textbooks
purchased and distance to the university Data result from making observations either
on a single variable or simultaneously on two or more variables
A univariate data set consists of observations on a single variable made on uals in a sample or population There are two types of univariate data sets: categorical
individ-and numerical In the previous example, calculator brindivid-and is a categorical variable,
be-cause each student’s response to the query, “What brand of calculator do you own?” is
a category The collection of responses from all these students forms a categorical data
set The other two attributes, number of textbooks purchased and distance to the
uni-versity, are both numerical in nature Determining the value of such a numerical
vari-able (by counting or measuring) for each student results in a numerical data set
E x a m p l e 1 5 Airline Safety Violations
The Federal Aviation Administration (FAA) monitors airlines and can take trative actions for safety violations Information about the fines assessed by the FAA
adminis-appeared in the article “Just How Safe Is That Jet?” (USA Today, March 13, 2000).
Violations that could lead to a fine were categorized as Security (S), Maintenance (M),Flight Operations (F), Hazardous Materials (H), or Other (O) Data for the variable
type of violation for 20 administrative actions are given in the following list (these
data are a subset of the data described in the article, but they are consistent withsummary values given in the paper; for a description of the full data set, see Exercise 1.24):
D E F I N I T I O N
A data set consisting of observations on a single attribute is a univariate
data set A univariate data set is categorical (or qualitative) if the
individ-ual observations are categorical responses A univariate data set is
numeri-cal (or quantitative) if each observation is a number.