1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Introduction to statistics and data analysis 3e by peck devore

804 366 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 804
Dung lượng 44,99 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

1 The Role of Statistics and the Data Analysis Process 1 1.1 Three Reasons to Study Statistics 1 1.2 The Nature and Role of Variability 4 1.3 Statistics and the Data Analysis Process 7 1

Trang 2

to Statistics

and Data Analysis

Trang 3

This page intentionally left blank

Trang 4

California Polytechnic State University, San Luis Obispo

Australia • Brazil • Canada • Mexico • Singapore • Spain • United Kingdom • United States

Trang 5

Introduction to Statistics and Data Analysis,

Third Edition

Roxy Peck, Chris Olsen, Jay Devore

Acquisitions Editor: Carolyn Crockett

Development Editor: Danielle Derbenti

Assistant Editor: Beth Gershman

Editorial Assistant: Ashley Summers

Technology Project Manager: Colin Blake

Marketing Manager: Joe Rogove

Marketing Assistant: Jennifer Liang

Marketing Communications Manager: Jessica Perry

Project Manager, Editorial Production: Jennifer Risden

Creative Director: Rob Hugel

Art Director: Vernon Boes

Print Buyer: Karen Hunt

Permissions Editor: Isabel Alves Production Service: Newgen–Austin Text Designer: Stuart Paterson Photo Researcher: Gretchen Miller Copy Editor: Nancy Dickson Illustrator: Jade Myers; Newgen–India Cover Designer: Stuart Paterson Cover Image: Paul Chesley/Getty Images Cover Printer: Courier Corporation/Kendallville Compositor: Newgen–India

Printer: Courier Corporation/Kendallville

© 2008, 2005 Duxbury, an imprint of Thomson Brooks/Cole, a

part of The Thomson Corporation Thomson, the Star logo, and

Brooks/Cole are trademarks used herein under license.

ALL RIGHTS RESERVED No part of this work covered by the

copyright hereon may be reproduced or used in any form or by

any means—graphic, electronic, or mechanical, including

photo-copying, recording, taping, Web distribution, information storage

and retrieval systems, or in any other manner—without the

writ-ten permission of the publisher.

Printed in the United States of America

1 2 3 4 5 6 7 11 10 09 08 07

ExamView ® and ExamView Pro ® are registered trademarks

of FSCreations, Inc Windows is a registered trademark of the

Microsoft Corporation used herein under license Macintosh and

Power Macintosh are registered trademarks of Apple Computer,

Inc Used herein under license.

Library of Congress Control Number: 2006933904

For more information about our products, contact us at:

Thomson Learning Academic Resource Center 1-800-423-0563

For permission to use material from this text or product, submit a

request online at http://www.thomsonrights.com.

Any additional questions about permissions can be submitted by

e-mail to thomsonrights@thomson.com.

Trang 6

bet I wouldn’t put their names in this book.

Trang 7

ROXY PECK is Associate Dean of the

College of Science and Mathematics

and Professor of Statistics at California

Polytechnic State University, San Luis

Obispo Roxy has been on the faculty

at Cal Poly since 1979, serving for six

years as Chair of the Statistics Department before

becoming Associate Dean She received an M.S in

Mathematics and a Ph.D in Applied Statistics from

the University of California, Riverside Roxy is

na-tionally known in the area of statistics education,

and in 2003 she received the American Statistical

Association’s Founder’s Award, recognizing her

con-tributions to K–12 and undergraduate statistics

edu-cation She is a Fellow of the American Statistical

Association and an elected member of the

Interna-tional Statistics Institute Roxy has recently

com-pleted five years as the Chief Reader for the

Ad-vanced Placement Statistics Exam and currently

chairs the American Statistical Association’s Joint

Committee with the National Council of Teachers of

Mathematics on Curriculum in Statistics and

Proba-bility for Grades K–12 In addition to her texts in

in-troductory statistics, Roxy is also co-editor of

Statisti-cal Case Studies: A Collaboration Between Academe

and Industry and a member of the editorial board for

Statistics: A Guide to the Unknown, 4th edition

Out-side the classroom and the office, Roxy likes to travel

and spends her spare time reading mystery novels

She also collects Navajo rugs and heads to New

Mex-ico whenever she can find the time

CHRIS OLSEN has taught statistics

at George Washington High School in

Cedar Rapids, Iowa, for over 25 years

Chris is a past member of the

Ad-vanced Placement Statistics Test

De-velopment Committee and the author

of the Teacher’s Guide for Advanced Placement

Sta-tistics He has been a table leader at the AP Statistics

reading for 6 years and since the summer of 1996 has

been a consultant to the College Board Chris leads

workshops and institutes for AP Statistics teachers

in the United States and internationally Chris was

the Iowa recipient of the Presidential Award for

Ex-cellence in Science and Mathematics Teaching in

1986 He was a regional winner of the IBM

Com-puter Teacher of the Year award in 1988 and receivedthe Siemens Award for Advanced Placement in math-ematics in 1999 Chris is a frequent contributor tothe AP Statistics Electronic Discussion Group and

has reviewed materials for The Mathematics Teacher, the AP Central web site, The American Statistician, and the Journal of the American Statistical Associa-

tion He currently writes a column for Stats

maga-zine Chris graduated from Iowa State Universitywith a major in mathematics and, while acquiringgraduate degrees at the University of Iowa, concen-trated on statistics, computer programming, psycho-metrics, and test development Currently, he divideshis duties between teaching and evaluation; in addi-tion to teaching, he is the assessment facilitator forthe Cedar Rapids, Iowa, Community Schools In hisspare time he enjoys reading and hiking He and hiswife have a daughter, Anna, who is a graduate stu-dent in Civil Engineering at Cal Tech

JAY DEVORE earned his graduate degree in Engineering Sci-ence from the University of California

under-at Berkeley, spent a year under-at the sity of Sheffield in England, and fin-ished his Ph.D in statistics at StanfordUniversity He previously taught at the University ofFlorida and at Oberlin College and has had visitingappointments at Stanford, Harvard, the University

Univer-of Washington, and New York University From 1998

to 2006, Jay served as Chair of the Statistics ment at California Polytechnic State University, SanLuis Obispo The Statistics Department at Cal Polyhas an international reputation for activities in sta-tistics education In addition to this book, Jay haswritten several widely used engineering statisticstexts and is currently working on a book in appliedmathematical statistics He is the recipient of a dis-tinguished teaching award from Cal Poly and is aFellow of the American Statistical Association Inhis spare time, he enjoys reading, cooking and eatinggood food, tennis, and travel to faraway places He isespecially proud of his wife, Carol, a retired elemen-tary school teacher, his daughter Allison, who worksfor the Center for Women and Excellence in Boston,and his daughter Teri, who is finishing a graduateprogram in education at NYU

Trang 8

1 The Role of Statistics and the Data Analysis Process 1

1.1 Three Reasons to Study Statistics 1

1.2 The Nature and Role of Variability 4

1.3 Statistics and the Data Analysis Process 7

1.4 Types of Data and Some Simple Graphical Displays 12

2 Collecting Data Sensibly 27

2.1 Statistical Studies: Observation and Experimentation 27

2.2 Sampling 32

2.3 Simple Comparative Experiments 42

2.4 More on Experimental Design 51

2.5 More on Observational Studies: Designing Surveys (Optional) 56

2.6 Interpreting and Communicating the Results of

Statistical Analyses 61

Graphing Calculator Explorations 69

3 Graphical Methods for Describing Data 75

3.1 Displaying Categorical Data: Comparative Bar Charts

and Pie Charts 76

3.2 Displaying Numerical Data: Stem-and-Leaf Displays 87

Trang 9

viii ■ Contents

3.3 Displaying Numerical Data: Frequency Distributions

and Histograms 97

3.4 Displaying Bivariate Numerical Data 117

3.5 Interpreting and Communicating the Results of

Statistical Analyses 127

Graphing Calculator Explorations 141

4 Numerical Methods for Describing Data 147

4.1 Describing the Center of a Data Set 148

4.2 Describing Variability in a Data Set 159

4.3 Summarizing a Data Set: Boxplots 169

4.4 Interpreting Center and Variability: Chebyshev’s Rule,

the Empirical Rule, and z Scores 176

4.5 Interpreting and Communicating the Results of

Statistical Analyses 186

Graphing Calculator Explorations 195

5 Summarizing Bivariate Data 199

5.1 Correlation 200

5.2 Linear Regression: Fitting a Line to Bivariate Data 210

5.3 Assessing the Fit of a Line 221

5.4 Nonlinear Relationships and Transformations 238

5.5 Logistic Regression (Optional) 255

5.6 Interpreting and Communicating the Results

of Statistical Analyses 264

Graphing Calculator Explorations 272

6 Probability 279

6.1 Chance Experiments and Events 279

6.2 Definition of Probability 288

Trang 10

6.3 Basic Properties of Probability 295

6.4 Conditional Probability 302

6.6 Some General Probability Rules 323

6.7 Estimating Probabilities Empirically Using Simulation 335

Graphing Calculator Explorations 351

7 Random Variables and Probability Distributions 357

7.2 Probability Distributions for Discrete Random Variables 361

7.3 Probability Distributions for Continuous Random Variables 367

7.4 Mean and Standard Deviation of a Random Variable 372

7.5 Binomial and Geometric Distributions 386

7.6 Normal Distributions 397

7.7 Checking for Normality and Normalizing Transformations 414

7.8 Using the Normal Distribution to Approximate a

Discrete Distribution 425

Graphing Calculator Explorations 434

8 Sampling Variability and Sampling Distributions 445

8.1 Statistics and Sampling Variability 446

8.2 The Sampling Distribution of a Sample Mean 450

8.3 The Sampling Distribution of a Sample Proportion 461

an Advantage in College Admissions? 468

Graphing Calculator Explorations 471

9 Estimation Using a Single Sample 475

9.1 Point Estimation 476

9.2 Large-Sample Confidence Interval for a Population Proportion 482

9.3 Confidence Interval for a Population Mean 495

Trang 11

x ■ Contents

9.4 Interpreting and Communicating the Results of

Statistical Analyses 508

Population Proportion 515

Graphing Calculator Explorations 521

10 Hypothesis Testing Using a Single Sample 525

10.1 Hypotheses and Test Procedures 526

10.2 Errors in Hypotheses Testing 531

10.3 Large-Sample Hypothesis Tests for a Population Proportion 537

10.4 Hypotheses Tests for a Population Mean 550

10.5 Power and Probability of Type II Error 562

10.6 Interpreting and Communicating the Results of

Statistical Analyses 571

Graphing Calculator Explorations 580

11 Comparing Two Populations or Treatments 583

11.1 Inferences Concerning the Difference Between Two Population

or Treatment Means Using Independent Samples 583

11.2 Inferences Concerning the Difference Between Two Population

or Treatment Means Using Paired Samples 606

11.3 Large Sample Inferences Concerning a Difference Between Two

Population or Treatment Proportions 619

11.4 Interpreting and Communicating the Results of

Statistical Analyses 629

Graphing Calculator Explorations 641

Trang 12

12 The Analysis of Categorical Data and

Goodness-of-Fit Tests 647

12.1 Chi-Square Tests for Univariate Data 647

12.2 Tests for Homogeneity and Independence in a

Two-way Table 660

12.3 Interpreting and Communicating the Results of

Statistical Analyses 677

Graphing Calculator Explorations 685

13 Simple Linear Regression and Correlation:

Inferential Methods 689

13.1 Simple Linear Regression Model 690

13.2 Inferences About the Slope of the Population Regression Line 702

13.4 Inferences Based on the Estimated Regression Line

Graphing Calculator Exploration 746

14 Multiple Regression Analysis 749

14.1 Multiple Regression Models 750

14.2 Fitting a Model and Assessing Its Utility 763

14.3 Inferences Based on an Estimated Model 14-1

14.4 Other Issues in Multiple Regression 14-13

14.5 Interpreting and Communicating the Results of

Statistical Analyses 14-26

Predictors and Sample Size 780

Sections and/or chapter numbers in color can be found at www.thomsonedu.com/statistics/peck

Trang 13

Graphing Calculator Exploration 811

16 Nonparametric (Distribution-Free) Statistical Methods 16-1

16.1 Distribution-Free Procedures for Inferences About a Difference

Between Two Population or Treatment Means Using IndependentSamples (Optional) 16-1

16.2 Distribution-Free Procedures for Inferences About a Difference

Between Two Population or Treatment Means Using Paired Samples 16-10

Trang 14

In a nutshell, statistics is about understanding the role that variability plays in

draw-ing conclusions based on data Introduction to Statistics and Data Analysis, Third

Edi-tion develops this crucial understanding of variability through its focus on the dataanalysis process

An Organization That Reflects the Data Analysis Process

Students are introduced early to the idea that data analysis is a process that begins withcareful planning, followed by data collection, data description using graphical andnumerical summaries, data analysis, and finally interpretation of results This process

is described in detail in Chapter 1, and the ordering of topics in the first ten chapters

of the book mirrors this process: data collection, then data description, then statisticalinference

The logical order in the data analysis process can be pictured as shown in the lowing figure

fol-Unlike many introductory texts, Introduction to Statistics and Data Analysis,

Third Edition is organized in a manner consistent with the natural order of the dataanalysis process:

Step 1:

Acknowledging Variability—

Collecting Data Sensibly

Step 2:

Describing Variability

in the Data—

Descriptive Statistics

Step 3:

Drawing Conclusions

in a Way That Recognizes Variability in the Data

Trang 15

xiv ■ Preface

The Importance of Context and Real Data

Statistics is not about numbers; it is about data—numbers in context It is the contextthat makes a problem meaningful and something worth considering For example, ex-ercises that ask students to compute the mean of 10 numbers or to construct a dotplot

or boxplot of 20 numbers without context are arithmetic and graphing exercises Theybecome statistics problems only when a context gives them meaning and allows for in-terpretation While this makes for a text that may appear “wordy” when compared totraditional mathematics texts, it is a critical and necessary component of a modern sta-tistics text

Examples and exercises with overly simple settings do not allow students to tice interpreting results in authentic situations or give students the experience neces-sary to be able to use statistical methods in real settings We believe that the exercisesand examples are a particular strength of this text, and we invite you to compare theexamples and exercises with those in other introductory statistics texts

prac-Many students are skeptical of the relevance and importance of statistics trived problem situations and artificial data often reinforce this skepticism A strategythat we have employed successfully to motivate students is to present examples andexercises that involve data extracted from journal articles, newspapers, and other pub-lished sources Most examples and exercises in the book are of this nature; they cover

Con-a very wide rCon-ange of disciplines Con-and subject Con-areCon-as These include, but Con-are not limited

to, health and fitness, consumer research, psychology and aging, environmental search, law and criminal justice, and entertainment

re-A Focus on Interpretation and Communication

Most chapters include a section titled “Interpreting and Communicating the Results ofStatistical Analyses.” These sections include advice on how to best communicate theresults of a statistical analysis and also consider how to interpret statistical summaries

Step 1:

Acknowledging Variability—

Collecting Data Sensibly

Step 2:

Describing Variability

in the Data—

Descriptive Statistics

Probability Supports the Connection

Step 3:

Drawing Conclusions

in a Way That Recognizes Variability in the Data

Chapters 1–2 Chapters 3–5 Chapters 6–7 Chapters 8–15

Trang 16

found in journals and other published sources A subsection titled “A Word to theWise” reminds readers of things that must be considered in order to ensure that statis-tical methods are employed in reasonable and appropriate ways.

Consistent with Recommendations for the Introductory Statistics Course Endorsed

by the American Statistical Association

In 2005, the American Statistical Association endorsed the report “College Guidelines

in Assessment and Instruction for Statistics Education (GAISE Guidelines),” whichincluded the following six recommendations for the introductory statistics course:

1 Emphasize statistical literacy and develop statistical thinking

2 Use real data

3 Stress conceptual understanding rather than mere knowledge of procedures

4 Foster active learning in the classroom

5 Use technology for developing conceptual understanding and analyzing data

6 Use assessments to improve and evaluate student learning

Introduction to Statistics and Data Analysis, Third Edition is consistent with these

rec-ommendations and supports the GAISE guidelines in the following ways:

1 Emphasize statistical literacy and develop statistical thinking.

Statistical literacy is promoted throughout the text in the many examples and cises that are drawn from the popular press In addition, a focus on the role of vari-ability, consistent use of context, and an emphasis on interpreting and communi-cating results in context work together to help students develop skills in statisticalthinking

exer-2 Use real data.

The examples and exercises from Introduction to Statistics and Data Analysis,

Third Edition are context driven and reference sources that include the popularpress as well as journal articles

3 Stress conceptual understanding rather than mere knowledge of procedures.

Nearly all exercises in Introduction to Statistics and Data Analysis, Third Edition

are multipart and ask students to go beyond just computation They focus on pretation and communication, not just in the chapter sections specifically devoted

inter-to this inter-topic, but throughout the text The examples and explanations are designed

to promote conceptual understanding Hands-on activities in each chapter are alsoconstructed to strengthen conceptual understanding Which brings us to

4 Foster active learning in the classroom.

While this recommendation speaks more to pedagogy and classroom practice,

In-troduction to Statistics and Data Analysis, Third Edition provides 33 hands-on

ac-tivities in the text and additional acac-tivities in the accompanying instructor resourcesthat can be used in class or assigned to be completed outside of class In addition,accompanying online materials allow students to assess their understanding and de-velop a personalized learning plan based on this assessment for each chapter

5 Use technology for developing conceptual understanding and analyzing data.

The computer has brought incredible statistical power to the desktop of every vestigator The wide availability of statistical computer packages such as MINITAB,S-Plus, JMP, and SPSS, and the graphical capabilities of the modern microcom-puter have transformed both the teaching and learning of statistics To highlight therole of the computer in contemporary statistics, we have included sample output

Trang 17

in-xvi ■ Preface

throughout the book In addition, numerous exercises contain data that can easily

be analyzed by computer, though our exposition firmly avoids a presupposition thatstudents have access to a particular statistical package Technology manuals forspecific packages, such as MINITAB and SPSS, are available in the online materi-als that accompany this text

The appearance of hand-held calculators with significant statistical and ing capability has also changed statistics instruction in classrooms where access tocomputers is still limited The computer revolution of a previous generation is now

graph-being writ small—or, possibly we should say, smaller—for the youngest generation

of investigators There is not, as we write, anything approaching universal or evenwide agreement about the proper role for the graphing calculator in college statis-tics classes, where access to a computer is more common At the same time, fortens of thousands of students in Advanced Placement Statistics in our high schools,the graphing calculator is the only dependable access to statistical technology.This text allows the instructor to balance the use of computers and calculators

in a manner consistent with his or her philosophy and presents the power of the culator in a series of Graphing Calculator Explorations These are placed at the end

cal-of each chapter, unobtrusive to those instructors whose technology preference isthe computer while still accessible to those instructors and students comfortablewith graphing calculator technology As with computer packages, our expositionavoids assuming the use of a particular calculator and presents the calculator capa-bilities in a generic format; specifically, we do not teach particular keystroke se-quences, believing that the best source for such specific information is the calcula-tor manual For those using a TI graphing calculator, there is a technology manualavailable in the online materials that accompany this text As much as possible, thecalculator explorations are independent of each other, allowing instructors to pickand choose calculator topics that are more relevant to their particular courses

6 Use assessments to improve and evaluate student learning.

Assessment materials in the form of a test bank, quizzes, and chapter exams areavailable in the instructor resources that accompany this text The items in the testbank reflect the data-in-context philosophy of the text’s exercises and examples

Advanced Placement Statistics

We have designed this book with a particular eye toward the syllabus of the AdvancedPlacement Statistics course and the needs of high school teachers and students Con-cerns expressed and questions asked in teacher workshops and on the AP Statistics Elec-tronic Discussion Group have strongly influenced our exposition of certain topics, es-pecially in the area of experimental design and probability We have taken great care toprovide precise definitions and clear examples of concepts that Advanced PlacementStatistics instructors have acknowledged as difficult for their students We have also ex-panded the variety of examples and exercises, recognizing the diverse potential futuresenvisioned by very capable students who have not yet focused on a college major

Topic Coverage

Our book can be used in courses as short as one quarter or as long as one year in ration Particularly in shorter courses, an instructor will need to be selective in decid-ing which topics to include and which to set aside The book divides naturally into fourmajor sections: collecting data and descriptive methods (Chapters 1–5), probabilitymaterial (Chapters 6–8), the basic one- and two-sample inferential techniques (Chap-ters 9–12), and more advanced inferential methodology (Chapters 13–16) We include

Trang 18

du-an early chapter (Chapter 5) on descriptive methods for bivariate numerical data Thisearly exposure raises questions and issues that should stimulate student interest in thesubject; it is also advantageous for those teaching courses in which time constraintspreclude covering advanced inferential material However, this chapter can easily bepostponed until the basics of inference have been covered, and then combined withChapter 13 for a unified treatment of regression and correlation.

With the possible exception of Chapter 5, Chapters 1–10 should be covered in der We anticipate that most instructors will then continue with two-sample inference(Chapter 11) and methods for categorical data analysis (Chapter 12), although regres-sion could be covered before either of these topics Optional portions of Chapter 14(multiple regression) and chapter 15 (analysis of variance) and Chapter 16 (nonpara-metric methods) are included in the online materials that accompany this text

or-A Note on Probability

The content of the probability chapters is consistent with the Advanced Placement tistics course description It includes both a traditional treatment of probability andprobability distributions at an introductory level, as well as a section on the use of sim-ulation as a tool for estimating probabilities For those who prefer a briefer and more

Sta-informal treatment of probability, the book Statistics: The Exploration and Analysis of

Data, by Roxy Peck and Jay Devore, may be a more appropriate choice Except forthe treatment of probability and the omission of the Graphing Calculator Explorations,

it parallels the material in this text Please contact your sales rep for more informationabout this alternative and other alternative customized options available to you

New to This Edition

There are a number of changes in the Third Edition, including the following:

from current journals and newspapers are included In addition, more of the

exercises specifically ask students to write (for example, by requiring students toexplain their reasoning, interpret results, and comment on important features of ananalysis)

on-line from the text website are designated by an icon in the text, as are

ex-amples that are further illustrated in the technology manuals (MINITAB, SPSS,etc.) that are available in the online materials that accompany this text

Mont-gomery College, which can be viewed online or downloaded for viewing later.These exercises are designated by an icon in the text

activities These activities can be used as a chapter capstone or can be integrated

at appropriate places as the chapter material is covered in class

in each chapter and develop a personalized learning plan to assist them in

ad-dressing any areas of weakness

Although the order of topics in the text generally mirrors the data collectionprocess with methods of data collection covered first, two graphical displays (dot-plots and bar charts) are covered in Chapter 1 so that these simple graphical analy-sis tools can be used in the conceptual development of experimental design and so

Trang 19

xviii ■ Preface

that students have some tools for summarizing the data they collect through pling and experimentation in the exercises, examples, and activities of Chapter 2

those who would like more complete coverage of data analysis techniques for egorical data

such as inference and variable selection methods in multiple regression (Sections14.3 and 14.4) and analysis of variance for randomized block and two-factor de-

signs (Sections 15.3 and 15.4), have been moved to the online materials that

accompany this text.

between two population or treatment means using independent samples (formerlySection 11.4) has been moved to Chapter 16 This chapter, titled “Nonparametric(Distribution-Free) Statistical Methods,” also includes new material on inferencesabout the difference between two population or treatment means using pairedsamples and distribution-free analysis of variance, and is available in the onlinematerials that accompany this text

supple-ments such as a complete solutions manual and a test bank, the following are alsoavailable to instructors:

can be incorporated into classroom presentations and cross-references to sources such as Fathom, Workshop Statistics, and Against All Odds Of partic-ular interest to those teaching Advanced Placement Statistics, the binder alsoincludes additional data analysis questions of the type encountered on the freeresponse portion of the Advanced Placement exam, as well as a collection ofmodel responses

re-■For those who use student response systems in class, a set of “clicker”

assessing student understanding is available

Student Resources

If your text includes a printed access card, you will have instant access to the ing resources referenced throughout your text:

follow-■ThomsonNOW™ (see below for a full description of this powerful study tool)

■Complete step-by-step instructions for MINITAB, Excel, TI-83 Graphing lator, JMP, and SPSS indicated by the icon throughout the text

Calcu-■Data sets formatted for MINITAB, Excel, SPSS, SAS, JMP, TI-83, Fathom, andASCII indicated by ●icon throughout the text

■Applets used in the Activities found in the text

Student Solutions Manual (ISBN 0-495-11876-1) by Mary Mortlock of California

Polytechnic State University, San Luis Obispo

Check your work—and your understanding—with this manual, which providesworked-out solutions to the odd-numbered problems in the text

Trang 20

Activities Workbook (0-495-11883-4) by Roxy Peck.

Use this convenient workbook to take notes, record data, and cement your learning bycompleting textbook and bonus activities for each chapter

ThomsonNOW™ Homework (0-495-39230-8)

Save time, learn more, and succeed in the course with this online suite of resources(including an integrated eBook and Personalized Study plans) that give you the choices

and tools you need to study smarter and get the grade Note: If your text did not

in-clude a printed access card for ThomsonNOW, it is available for purchase online at

http://www.thomsonedu.com.

Instructor Resources

Annotated Instructor’s Edition (0-495-11888-5)

The Annotated Instructor’s Edition contains answers for all exercises, as well as an notated table of contents with comments written by Roxy Peck

an-Instructor’s Solutions Manual (0-495-11879-6) by Mary Mortlock of California

Polytechnic State University, San Luis Obispo

This manual contains worked-out solutions to all of the problems in the text

Instructor’s Resource Binder (0-495-11892-3) prepared by Chris Olsen.

Includes transparencies and Microsoft®PowerPoint®slides to make lecture and classpreparation quick and easy New to this edition, we have added some Activities Work-sheets authored by Carol Marchetti of Rochester Institute of Technology

Test Bank (0-495-11880-X) by Josh Tabor of Wilson High School, Peter

Flannagan-Hyde of Phoenix Country Day School, and Chris Olsen

Includes test questions for each section of the book

Activities Workbook (0-495-11883-4) by Roxy Peck.

Students can take notes, record data, and complete activities in this ready-to-use book, which includes activities from the textbook plus additional bonus activities foreach chapter

Enhanced WebAssign (ISBN 0-495-10963-0)

Enhanced WebAssign is the most widely used homework system in higher tion Available for this title, Enhanced WebAssign allows you to assign, collect, grade,and record homework assignments via the web This proven homework system hasbeen enhanced to include links to the textbook sections, video examples, and problem-specific tutorials Enhanced WebAssign is more than a homework system—it is a com-plete learning system for students

educa-ThomsonNOW™ Homework (0-495-39230-8)

ThomsonNOW’s Personalized Study plans allow students to study smarter by nosing their weak areas, and helping them focus on what they need to learn Based onresponses to chapter specific pre-tests, the plans suggest a course of study for students,

Trang 21

diag-xx ■ Preface

including many multimedia and interactive exercises to help students better learn thematerial After completing the study plan, they can take a post-test to measure theirprogress and understanding

Create, deliver, and customize tests and study guides (both print and online) in utes with this easy-to-use assessment and tutorial system, which contains all questionsfrom the Test Bank in electronic format

Finger Lakes Community CollegeHolly Ashton

Pikes Peak Community CollegeBarb Barnet

University of Wisconsin at Platteville

Eddie BevilacquaState University of New York College of Environmental Science

& ForestryPiotr BialasBorough of Manhattan Community College

Kelly BlackUnion CollegeGabriel ChandlerConnecticut CollegeAndy ChangYoungstown State UniversityJerry Chen

Suffolk Community CollegeRichard Chilcoat

Wartburg College

Marvin CreechChapman UniversityRon Degges

North Dakota State UniversityHemangini DeshmukhMercyhurst CollegeAnn Evans

University of Massachusetts at Boston

Central Carolina Community CollegeGuangxiong Fang

Daniel Webster CollegeSharon B FingerNicholls State UniversitySteven Garren

James Madison UniversityTyler Haynes

Saginaw Valley State UniversitySonja Hensler

St Petersburg CollegeTrish HutchinsonAngelo State UniversityBessie KirkwoodSweet Briar CollegeJeff Kollath

Oregon State University

Trang 22

Christopher LackeRowan UniversityMichael LeitnerLouisiana State UniversityZia Mahmood

College of DuPageArt Mark

Georgoa Military CollegeDavid Mathiason

Rochester Institute of TechnologyBob Mattson

Eureka College

C Mark MillerYork CollegeMegan MockoUniversity of FloridaKane NashimotoJames Madison UniversityHelen Noble

San Diego State UniversityBroderick Oluyede

Georgia Southern UniversityElaine Paris

Mercy CollegeShelly Ray ParsonsAims Community CollegeJudy Pennington-PriceMidway CollegeHazard Community CollegeJackson County High School

Michael I RatliffNorthern Arizona UniversityDavid R Rauth

Duquesne UniversityKevin J ReevesEast Texas Baptist UniversityRobb Sinn

North Georgia College & State University

Greg SliwaBroome Community CollegeAngela Stabley

Portland Community CollegeJeffery D Sykes

Ouachita Baptist UniversityYolande Tra

Rochester Institute of TechnologyNathan Wetzel

University of Wisconsin Stevens Point

Dr Mark WilsonWest Virginia University Institute

of TechnologyYong Yu

Ohio State UniversityToshiyuki YuasaUniversity of Houston

Jim BohanManheim Township High SchoolPat Buchanan

Pennsylvania State UniversityMary Christman

American UniversityIowa State UniversityMark GlickmanBoston University

John ImbrieUniversity of VirginiaPam Martin

Northeast Louisiana UniversityPaul Myers

Woodward AcademyDeanna PaytonOklahoma State University

Trang 23

xxii ■ Preface

Michael PhelanChapman UniversityAlan PolanskyNorthern Illinois University

Lawrence D RiesUniversity of Missouri ColumbiaJoe Ward

Health Careers High SchoolAdditionally, we would like to express our thanks and gratitude to all who helped tomake this book possible:

■Carolyn Crockett, our editor and friend, for her unflagging support and ful advice for more than a decade

thought-■Danielle Derbenti, Beth Gershman, and Colin Blake at Thomson Brooks/Cole, forthe development of all of the ancillary materials details and for keeping us ontrack

■Jennifer Risden, our project manager at Thomson Brooks/Cole, and Anne Seitz atHearthside Publishing Services, for artfully managing the myriad of details asso-ciated with the production process

■Nancy Dickson for her careful copyediting

■Brian Kotz for all his hard work producing the video solutions

■Mary Mortlock for her diligence and care in producing the student and instructorsolutions manuals for this book

■Josh Tabor and Peter Flannagan-Hyde for their contributions to the test bank thataccompanies the book

■Beth Chance and Francisco Garcia for producing the applet used in the confidenceinterval activities

Gary McClelland for producing the applets from Seeing Statistics used in the

re-gression activities

■Bittner Development Group for checking the accuracy of the manuscript

■Rachel Dagdagan, a student at Cal Poly, for her help in the preparation of themanuscript

And, as always, we thank our families, friends, and colleagues for their continuedsupport

Roxy Peck Chris Olsen Jay Devore

Trang 24

Context Driven

Applications

Real data examples and exercises

throughout the text are drawn from the

popular press, as well as journal articles.

Focus on Interpreting

and Communicating

Chapter sections on interpreting and

communicating results are designed to

emphasize the importance of being

able to interpret statistical output and

communicate its meaning to

non-statisticians A subsection entitled “A

Word to the Wise” reminds students of

things that must be considered in order

to ensure that statistical methods are

used in reasonable and appropriate

of interest (in this case, students at the university) Numerical measures of center and spread and boxplots help to enlighten us, and they also allow us to communicate to others what we have learned from the data.

■ A Word to the Wise: Cautions and Limitations

When computing or interpreting numerical descriptive measures, you need to keep in mind the following:

1 Measures of center don’t tell all Although measures of center, such as the mean and the median, do give us a sense of what might be considered a typical value for

a variable, this is only one characteristic of a data set Without additional tion about variability and distribution shape, we don’t really know much about the behavior of the variable.

informa-2 Data distributions with different shapes can have the same mean and standard viation For example, consider the following two histograms:

2.1 ▼ The article “Television’s Value to Kids: It’s All in

How They Use It” (Seattle Times, July 6, 2005) described

a study in which researchers analyzed standardized test sults and television viewing habits of 1700 children They found that children who averaged more than two hours of television viewing per day when they were younger than

re-3 tended to score lower on measures of reading ability and short term memory.

a Is the study described an observational study or an experiment?

b Is it reasonable to conclude that watching two or more hours of television is the cause of lower reading scores?

Explain.

E x a m p l e 3 2 2 Education Level and Income— Stay in School!

The time-series plot shown in Figure 3.34 appears on the U.S Census Bureau web site It shows the average earnings of workers by educational level as a proportion of the average earnings of a high school graduate over time For example, we can see from this plot that in 1993 the average earnings for people with bachelor’s degrees was about 1.5 times the average for high school graduates In that same year, the av- erage earnings for those who were not high school graduates was only about 75%

xxiii

Trang 25

Peck, Olsen, Devore’s

Introduction to Statistics and Data Analysis, Third Edition

Hands-on Activities in

Every Chapter

Thirty-three hands-on activities

in the text, and additional activities

in the accompanying instructor

re-sources, can be used to

encour-age active learning inside or

outside the classroom.

Graphing Calculator Explorations

Found at the end of most chapters, these explorations allow students to actively experience technology and promote statistical thinking.

E x p l o r a t i o n 3.3 Scaling the Histogram

When we constructed a histogram in the previous Exploration there were some bers that we temporarily ignored in the view screen We would like to return to those numbers now because they can seriously affect the look of a histogram When we left the histogram the numbers in our view window were set as shown in Figure 3.43.

num-These settings place the view window over the calculator’s Cartesian system for fective viewing of the histogram from the data of Example 3.15.

ef-We would now like to experiment a bit with the “Xscale.” In all statistical graphs produced by the calculator the Xscale and Yscale choices will control the placement

of the little “tick” marks on the x and y axis In Exploration 3.2, the XScale and YScale were set at 5 and 1, respectively The little tick marks on the x-axis were at multiples

of 5 (Because of the data, the x-axis tick marks were at multiples of 5 and the y-axis

didn’t appear.) Change the Xscale value to 2 and redraw the histogram You should see

a graph similar to Figure 3.44 The y-axis tick marks now appear at multiples of 2,

Note that changing the Xscale has altered not only the tick marks but also the class intervals for the histogram The choice of class intervals can significantly change the look and feel of the histogram The choice of Xscale can affect judgments about the shape of the histogram Because of this possibility it is wise to look at a histogram with varying choices of the Xscale value If the shape appears very similar for differ- ent choices of Xscale, you can interpret and describe the shape with more confidence.

However, if different Xscale choices alter the look of the histogram you should ably be more tentative.

prob-F i g u r e 3 4 3

F i g u r e 3 4 4

A c t i v i t y 2.4 Video Games and Pain Management

Background: Video games have been used for pain agement by doctors and therapists who believe that the attention required to play a video game can distract the player and thereby decrease the sensation of pain The pa-

man-per “Video Games and Health” (British Medical Journal

[2005]:122–123) states:

“However, there has been no long term follow-up and

no robust randomized controlled trials of such ventions Whether patients eventually tire of such games is also unclear Furthermore, it is not known whether any distracting effect depends simply on con- centrating on an interactive task or whether the con- tent of games is also an important factor as there have been no controlled trials comparing video games with other distracters Further research should examine factors within games such as novelty, users’ prefer- ences, and relative levels of challenge and should

inter-compare video games with other potentially ing activities.”

distract-1 Working with a partner, select one of the areas of tential research suggested in the passage from the paper and formulate a specific question that could be addressed

po-by performing an experiment.

2 Propose an experiment that would provide data to dress the question from Step 1 Be specific about how subjects might be selected, what the experimental condi- tions (treatments) would be, and what response would be measured.

ad-3 At the end of Section 2.3 there are 10 questions that can be used to evaluate an experimental design Answer these 10 questions for the design proposed in Step 2.

4 After evaluating your proposed design, are there any changes you would like to make to your design? Explain.

Trang 26

Develop Conceptual

Understanding

Applets Allow Students to

See the Concepts

Within the Activities, applets are

used to illustrate and promote a

deeper understanding of the key

statistical concepts.

And Analyze Data

Real Data Sets

Real data sets promote statistical analysis, as well as technology use They are formatted for MINITAB, Excel, SPSS, SAS, JMP, TI-83, and ASCII and are indicated by the icon throughout the text.

Continue generating intervals until you have seen

at least 1000 intervals, and then answer the following question:

a How does the proportion of intervals constructed that contain m  100 compare to the stated confidence level of

3.22 ● Medicare’s new medical plans offer a wide range

of variations and choices for seniors when picking a drug

plan (San Luis Obispo Tribune, November 25, 2005) The

monthly cost for a stand-alone drug plan varies from plan

to plan and from state to state The accompanying table

gives the premium for the plan with the lowest cost for

Oklahoma 10.07

Pennsylvania 10.14 Rhode Island 7.32 South Carolina 16.57 South Dakota 1.87 Tennessee 14.08

E x e r c i s e s 3.22–3.34

27.0 would then fall in the class 27.0 to 27.5.

E x a m p l e 3 1 4 Enrollments at Public Universities

● States differ widely in the percentage of college students who are enrolled in lic institutions The National Center for Education Statistics provided the accompa- nying data on this percentage for the 50 U.S states for fall 2002.

pub-Percentage of College Students Enrolled in Public Institutions

is reasonable to start the first class interval at 40 and let each interval have a width

E x a m p l e 1 9 Revisiting Motorcycle Helmets

Example 1.8 used data on helmet use from a sample of 1700 motorcyclists to struct a frequency distribution (Table 1.1) Figure 1.5 shows the bar chart corre- sponding to this frequency distribution.

con-500 600 700 800 900 Frequency

Complete online step-by-step instructions for MINITAB, Excel, TI-83 Graphing Calculator, JMP, and SPSS are indicated

by the icon throughout the text.

Page 101

xxv

Trang 27

Evaluate as You Teach Using Clickers

Using clicker content authored by Roxy Peck, evaluate your students’ understanding immediately — in class — after teaching a concept Whether it’s a quick quiz, a poll to

be used for in-class data, or just checking in

to see if it is time to move on, our quality, tested content creates truly interactive classrooms with students’ responses shaping the lecture as you teach.

Video Solutions Motivate Student Understanding

More than 90 exercises will have video solutions, presented by Brian Kotz of Montgomery College, which can be viewed online or downloaded for later viewing These exercises will be designated by the in the text.

Get Feedback from Roxy

Peck on What You Need

to Learn

ThomsonNOW allows students to

assess their understanding and develop

a personalized learning plan based on

this assessment for each chapter

Pre- and post-tests include feedback

authored by Roxy Peck.

3.25USA Today (July 2, 2001) gave the following

in-formation regarding cell phone use for men and women:

a Construct a relative frequency histogram for average

number of minutes used per month for men How would you describe the shape of this histogram?

b Construct a relative frequency histogram for average

number of minutes used per month for women Is the tribution for average number of minutes used per month similar for men and women? Explain.

dis-c What proportion of men average less than 400 minutes

per month?

d Estimate the proportion of men that average less than

500 minutes per month.

Peck, Olsen, Devore’s

Introduction to Statistics and Data Analysis, Third Edition

xxvi

Trang 28

P R I N T Annotated Instructor’s Edition

■ Step-by-step instructions for MINITAB, Excel, TI-83

Graphing Calculator, JMP, and SPSS.

■ Data sets formatted for MINITAB, Excel, SPSS,

SAS, JMP, TI-83, and ASCII.

Trang 29

The Role of Statistics

and the Data Analysis

Process

Improve your understanding and save time! Visit http://www.thomsonedu.com/login where you will find:

■ Step-by-step instructions for MINITAB, Excel, TI-83, SPSS,

and JMP

■ Video solutions to selected exercises

■ Data sets available for selected examples and exercises

■ Exam-prep pre-tests that build a Personalized Learning Plan based on your results so that you know exactly what to study

■ Help from a live statistics tutor 24 hours a day

We encounter data and conclusions based on data every day Statistics is the

sci-entific discipline that provides methods to help us make sense of data Some peopleare suspicious of conclusions based on statistical analyses Extreme skeptics, usuallyspeaking out of ignorance, characterize the discipline as a subcategory of lying—something used for deception rather than for positive ends However, we believe thatstatistical methods, used intelligently, offer a set of powerful tools for gaining insightinto the world around us Statistical methods are used in business, medicine, agricul-ture, social sciences, natural sciences, and applied sciences, such as engineering Thewidespread use of statistical analyses in diverse fields has led to increased recognitionthat statistical literacy—a familiarity with the goals and methods of statistics—should

be a basic component of a well-rounded educational program

The field of statistics teaches us how to make intelligent judgments and informeddecisions in the presence of uncertainty and variation In this chapter, we consider thenature and role of variability in statistical settings, introduce some basic terminology,and look at some simple graphical displays for summarizing data

Because statistical methods are used to organize, summarize, and draw conclusionsfrom data, a familiarity with statistical techniques and statistical literacy is vital in to-day’s society Everyone needs to have a basic understanding of statistics, and many

Trang 30

college majors require at least one course in statistics There are three important sons why statistical literacy is important: (1) to be informed, (2) to understand issuesand be able to make sound decisions based on data, and (3) to be able to evaluate de-cisions that affect your life Let’s explore each reason in detail.

rea-■ The First Reason: Being Informed

How do we decide whether claims based on numerical information are reasonable?

We are bombarded daily with numerical information in news, in advertisements, andeven in conversation For example, here are a few of the items employing statisticalmethods that were part of just two weeks’ news

■The increasing popularity of online shopping has many consumers using Internetaccess at work to browse and shop online In fact, the Monday after Thanksgivinghas been nicknamed “Cyber Monday” because of the large increase in online pur-chases that occurs on that day Data from a large-scale survey conducted in earlyNovember, 2005, by a market research firm was used to compute estimates of thepercent of men and women who shop online while at work The resulting esti-mates probably won’t make most employers happy— 42% of the men and 32% of

the women in the sample were shopping online at work! (Detroit Free Press and

San Luis Obispo Tribune, November 26, 2005)

A story in the New York Times titled “Students Ace State Tests, but Earn D’s From

U.S.” investigated discrepancies between state and federal standardized test sults When researchers compared state test results to the most recent results onthe National Assessment of Educational Progress (NAEP), they found that largedifferences were common For example, one state reported 89% of fourth graderswere proficient in reading based on the state test, while only 18% of fourth graders

re-in that state were considered proficient re-in readre-ing on the federal test! An

explana-tion of these large discrepancies and potential consequences was discussed (New

York Times, November 26, 2005)

■Can dogs help patients with heart failure by reducing stress and anxiety? One ofthe first scientific studies of the effect of therapeutic dogs found that a measure ofanxiety decreased by 24% for heart patients visited by a volunteer and dog, butonly by 10% for patients visited by just the volunteer Decreases were also noted

in measures of stress and heart and lung pressure, leading researchers to conclude

that the use of therapeutic dogs is beneficial in the treatment of heart patients (San

Luis Obispo Tribune, November 16, 2005)

■Late in 2005, those eligible for Medicare had to decide which, if any, of the manycomplex new prescription medication plans was right for them To assist with thisdecision, a program called PlanFinder that compares available options was made

available online But are seniors online? Based on a survey conducted by the Los

Angeles Times, it was estimated that the percentage of senior citizens that go

on-line is only between 23% and 30%, causing concern over whether providing only

an online comparison is an effective way to assist seniors with this important

de-cision (Los Angeles Times, November 27, 2005)

■Are kids ruder today than in the past? An article titled “Kids Gone Wild” marized data from a survey conducted by the Associated Press Nearly 70% ofthose who participated in the survey said that people were ruder now than 20 yearsago, with kids being the biggest offenders As evidence that this is a serious prob-lem, the author of the article also referenced a 2004 study conducted by PublicAgenda, a public opinion research group That study indicated that more than

Trang 31

sum-one third of teachers had either seriously considered leaving teaching or knew a

colleague who left because of intolerable student behavior (New York Times,

November 27, 2005)

■When people take a vacation, do they really leave work behind? Data from a pollconducted by Travelocity led to the following estimates: Approximately 40% oftravelers check work email while on vacation, about 33% take cell phones on va-cation in order to stay connected with work, and about 25% bring a laptop com-puter on vacation The travel industry is paying attention—hotels, resorts, andeven cruise ships are now making it easier for “vacationers” to stay connected to

work (San Luis Obispo Tribune, December 1, 2005)

■How common is domestic violence? Based on interviews with 24,000 women in

10 different countries, a study conducted by the World Health Organization foundthat the percentage of women who have been abused by a partner varied widely—from 15% of women in Japan to 71% of women in Ethiopia Even though the do-mestic violence rate differed dramatically from country to country, in all of thecountries studied women who were victims of domestic violence were abouttwice as likely as other women to be in poor health, even long after the violence

had stopped (San Francisco Chronicle, November 25, 2005)

■Does it matter how long children are bottle-fed? Based on a study of 2121 dren between the ages of 1 and 4, researchers at the Medical College of Wiscon-sin concluded that there was an association between iron deficiency and the length

chil-of time that a child is bottle-fed They found that children who were bottle-fed between the ages of 2 and 4 were three times more likely to be iron deficient than

those who stopped by the time they were 1 year old (Milwaukee Journal Sentinel and San Luis Obispo Tribune, November 26, 2005)

■Parental involvement in schools is often regarded as an important factor in studentachievement However, data from a study of low-income public schools in Cali-fornia led researchers to conclude that other factors, such as prioritizing studentachievement, encouraging teacher collaboration and professional development,and using assessment data to improve instruction, had a much greater impact on

the schools’ Academic Performance Index (Washington Post and San Francisco

Chronicle, November 26, 2005)

To be an informed consumer of reports such as those described above, you must beable to do the following:

1 Extract information from tables, charts, and graphs

2 Follow numerical arguments

3 Understand the basics of how data should be gathered, summarized, and analyzed

to draw statistical conclusions

Your statistics course will help prepare you to perform these tasks

Throughout your personal and professional life, you will need to understand cal information and make informed decisions using this information To make thesedecisions, you must be able to do the following:

statisti-1 Decide whether existing information is adequate or whether additional information

is required

2 If necessary, collect more information in a reasonable and thoughtful way

3 Summarize the available data in a useful and informative manner

1.1 ■ Three Reasons to Study Statistics 3

Trang 32

4 Analyze the available data.

5 Draw conclusions, make decisions, and assess the risk of an incorrect decision.People informally use these steps to make everyday decisions Should you go out for

a sport that involves the risk of injury? Will your college club do better by trying toraise funds with a benefit concert or with a direct appeal for donations? If you choose

a particular major, what are your chances of finding a job when you graduate? Howshould you select a graduate program based on guidebook ratings that include infor-mation on percentage of applicants accepted, time to obtain a degree, and so on? Thestudy of statistics formalizes the process of making decisions based on data and pro-vides the tools for accomplishing the steps listed

While you will need to make informed decisions based on data, it is also the case thatother people will use statistical methods to make decisions that affect you as an indi-vidual An understanding of statistical techniques will allow you to question and eval-uate decisions that affect your well-being Some examples are:

■Many companies now require drug screening as a condition of employment Withthese screening tests there is a risk of a false-positive reading (incorrectly indicat-ing drug use) or a false-negative reading (failure to detect drug use) What are theconsequences of a false result? Given the consequences, is the risk of a false re-sult acceptable?

■Medical researchers use statistical methods to make recommendations regardingthe choice between surgical and nonsurgical treatment of such diseases as coro-nary heart disease and cancer How do they weigh the risks and benefits to reachsuch a recommendation?

■University financial aid offices survey students on the cost of going to school andcollect data on family income, savings, and expenses The resulting data are used

to set criteria for deciding who receives financial aid Are the estimates they useaccurate?

■Insurance companies use statistical techniques to set auto insurance rates, though some states restrict the use of these techniques Data suggest that youngdrivers have more accidents than older ones Should laws or regulations limit howmuch more young drivers pay for insurance? What about the common practice ofcharging higher rates for people who live in urban areas?

al-An understanding of elementary statistical methods can help you to evaluatewhether important decisions such as the ones just mentioned are being made in a rea-sonable way

We hope that this textbook will help you to understand the logic behind statisticalreasoning, prepare you to apply statistical methods appropriately, and enable you torecognize when statistical arguments are faulty

Statistics is a science whose focus is on collecting, analyzing, and drawing conclusionsfrom data If we lived in a world where all measurements were identical for every in-dividual, all three of these tasks would be simple Imagine a population consisting of

Trang 33

all students at a particular university Suppose that every student took the same

num-ber of units, spent exactly the same amount of money on textbooks this semester, andfavored increasing student fees to support expanding library services For this popula-

tion, there is no variability in the number of units, amount spent on books, or student

opinion on the fee increase A researcher studying a sample from this population todraw conclusions about these three variables would have a particularly easy task Itwould not matter how many students the researcher included in the sample or how thesampled students were selected In fact, the researcher could collect information onnumber of units, amount spent on books, and opinion on the fee increase by just stop-ping the next student who happened to walk by the library Because there is no vari-ability in the population, this one individual would provide complete and accurate in-formation about the population, and the researcher could draw conclusions based onthe sample with no risk of error

The situation just described is obviously unrealistic Populations with no ity are exceedingly rare, and they are of little statistical interest because they present

variabil-no challenge! In fact, variability is almost universal It is variability that makes life(and the life of a statistician, in particular) interesting We need to understand vari-ability to be able to collect, analyze, and draw conclusions from data in a sensible way.One of the primary uses of descriptive statistical methods is to increase our under-standing of the nature of variability in a population

Examples 1.1 and 1.2 illustrate how an understanding of variability is necessary

to draw conclusions based on data

E x a m p l e 1 1 If the Shoe Fits

The graphs in Figure 1.1 are examples of a type of graph called a histogram (Theconstruction and interpretation of such graphs is discussed in Chapter 3.) Figure1.1(a) shows the distribution of the heights of female basketball players who played

at a particular university between 1990 and 1998 The height of each bar in the

1.2 ■ The Nature and Role of Variability 5

74 10

20

0 60

58 62 64 66 68 70 72 76 78

Height Frequency

(b) 74

(a)

F i g u r e 1 1 Histograms of heights (in inches) of female athletes: (a) basketball players; (b) gymnasts

Trang 34

graph indicates how many players’ heights were in the corresponding interval Forexample, 40 basketball players had heights between 72 in and 74 in., whereas only

2 players had heights between 66 in and 68 in Figure 1.1(b) shows the distribution

of heights for members of the women’s gymnastics team over the same period Bothhistograms are based on the heights of 100 women

The first histogram shows that the heights of female basketball players varied,with most heights falling between 68 in and 76 in In the second histogram we seethat the heights of female gymnasts also varied, with most heights in the range of

60 in to 72 in It is also clear that there is more variation in the heights of the nasts than in the heights of the basketball players, because the gymnast histogramspreads out more about its center than does the basketball histogram

gym-Now suppose that a tall woman (5 ft 11 in.) tells you she is looking for her sisterwho is practicing with her team at the gym Would you direct her to where the bas-ketball team is practicing or to where the gymnastics team is practicing? What rea-soning would you use to decide? If you found a pair of size 6 shoes left in the lockerroom, would you first try to return them by checking with members of the basketballteam or the gymnastics team?

You probably answered that you would send the woman looking for her sister

to the basketball practice and that you would try to return the shoes to a gymnasticsteam member To reach these conclusions, you informally used statistical reasoningthat combined your own knowledge of the relationship between heights of siblingsand between shoe size and height with the information about the distributions ofheights presented in Figure 1.1 You might have reasoned that heights of siblingstend to be similar and that a height as great as 5 ft 11 in., although not impossible,would be unusual for a gymnast On the other hand, a height as tall as 5 ft 11 in.would be a common occurrence for a basketball player Similarly, you might havereasoned that tall people tend to have bigger feet and that short people tend to havesmaller feet The shoes found were a small size, so it is more likely that they belong

to a gymnast than to a basketball player, because small heights and small feet areusual for gymnasts and unusual for basketball players

E x a m p l e 1 2 Monitoring Water Quality

As part of its regular water quality monitoring efforts, an environmental controlboard selects five water specimens from a particular well each day The concentra-tion of contaminants in parts per million (ppm) is measured for each of the fivespecimens, and then the average of the five measurements is calculated The his-togram in Figure 1.2 summarizes the average contamination values for 200 days.Now suppose that a chemical spill has occurred at a manufacturing plant 1 milefrom the well It is not known whether a spill of this nature would contaminategroundwater in the area of the spill and, if so, whether a spill this distance from thewell would affect the quality of well water

One month after the spill, five water specimens are collected from the well, andthe average contamination is 15.5 ppm Considering the variation before the spill,would you take this as convincing evidence that the well water was affected by thespill? What if the calculated average was 17.4 ppm? 22.0 ppm? How is your reason-ing related to the graph in Figure 1.2?

Trang 35

Before the spill, the average contaminant concentration varied from day to day.

An average of 15.5 ppm would not have been an unusual value, so seeing an average

of 15.5 ppm after the spill isn’t necessarily an indication that contamination has creased On the other hand, an average as large as 17.4 ppm is less common, and anaverage as large as 22.0 ppm is not at all typical of the prespill values In this case,

in-we would probably conclude that the in-well contamination level has increased

In these two examples, reaching a conclusion required an understanding of ability Understanding variability allows us to distinguish between usual and unusualvalues The ability to recognize unusual values in the presence of variability is the key

vari-to most statistical procedures and is also what enables us vari-to quantify the chance of ing incorrect when a conclusion is based on sample data These concepts will be de-veloped further in subsequent chapters

be-

Data and conclusions based on data appear regularly in a variety of settings: pers, television and radio advertisements, magazines, and professional publications Inbusiness, industry, and government, informed decisions are often data driven Statisti-cal methods, used appropriately, allow us to draw reliable conclusions based on data.Once data have been collected or once an appropriate data source has been iden-tified, the next step in the data analysis process usually involves organizing and sum-marizing the information Tables, graphs, and numerical summaries allow increasedunderstanding and provide an effective way to present data Methods for organizing

newspa-and summarizing data make up the branch of statistics called descriptive statistics.

After the data have been summarized, we often wish to draw conclusions or makedecisions based on the data This usually involves generalizing from a small group ofindividuals or objects that we have studied to a much larger group

For example, the admissions director at a large university might be interested inlearning why some applicants who were accepted for the fall 2006 term failed to enroll

1.3 ■ Statistics and the Data Analysis Process 7

18 10

40

20 30

0 11

Trang 36

at the university The population of interest to the director consists of all accepted plicants who did not enroll in the fall 2006 term Because this population is large and

ap-it may be difficult to contact all the individuals, the director might decide to collectdata from only 300 selected students These 300 students constitute a sample

The second major branch of statistics, inferential statistics, involves generalizing

from a sample to the population from which it was selected When we generalize in thisway, we run the risk of an incorrect conclusion, because a conclusion about the popula-tion is based on incomplete information An important aspect in the development of in-ferential techniques involves quantifying the chance of an incorrect conclusion

■ The Data Analysis Process

Statistics involves the collection and analysis of data Both tasks are critical Raw datawithout analysis are of little value, and even a sophisticated analysis cannot extractmeaningful information from data that were not collected in a sensible way

Planning and Conducting a Statistical Study Scientific studies are undertaken toanswer questions about our world Is a new flu vaccine effective in preventing illness?

Is the use of bicycle helmets on the rise? Are injuries that result from bicycle accidentsless severe for riders who wear helmets than for those who do not? How many creditcards do college students have? Do engineering students pay more for textbooks than

do psychology students? Data collection and analysis allow researchers to answersuch questions

The data analysis process can be viewed as a sequence of steps that lead fromplanning to data collection to informed conclusions based on the resulting data Theprocess can be organized into the following six steps:

1 Understanding the nature of the problem Effective data analysis requires an

un-derstanding of the research problem We must know the goal of the research andwhat questions we hope to answer It is important to have a clear direction beforegathering data to lessen the chance of being unable to answer the questions of in-terest using the data collected

2 Deciding what to measure and how to measure it The next step in the process is

deciding what information is needed to answer the questions of interest In some

D E F I N I T I O N

Descriptive statistics is the branch of statistics that includes methods for

organizing and summarizing data Inferential statistics is the branch of

sta-tistics that involves generalizing from a sample to the population from which

it was selected and assessing the reliability of such generalizations

D E F I N I T I O N

The entire collection of individuals or objects about which information is

desired is called the population of interest A sample is a subset of the

pop-ulation, selected for study in some prescribed manner

Trang 37

cases, the choice is obvious (e.g., in a study of the relationship between the weight

of a Division I football player and position played, you would need to collect data onplayer weight and position), but in other cases the choice of information is not asstraightforward (e.g., in a study of the relationship between preferred learning styleand intelligence, how would you define learning style and measure it and what mea-sure of intelligence would you use?) It is important to carefully define the variables

to be studied and to develop appropriate methods for determining their values

3 Data collection The data collection step is crucial The researcher must first decide

whether an existing data source is adequate or whether new data must be collected.Even if a decision is made to use existing data, it is important to understand how thedata were collected and for what purpose, so that any resulting limitations are alsofully understood and judged to be acceptable If new data are to be collected, a care-ful plan must be developed, because the type of analysis that is appropriate and thesubsequent conclusions that can be drawn depend on how the data are collected

4 Data summarization and preliminary analysis After the data are collected, the

next step usually involves a preliminary analysis that includes summarizing thedata graphically and numerically This initial analysis provides insight into impor-tant characteristics of the data and can provide guidance in selecting appropriatemethods for further analysis

5 Formal data analysis The data analysis step requires the researcher to select and

apply the appropriate inferential statistical methods Much of this textbook is voted to methods that can be used to carry out this step

de-6 Interpretation of results Several questions should be addressed in this final

step—for example, What conclusions can be drawn from the analysis? How do theresults of the analysis inform us about the stated research problem or question? andHow can our results guide future research? The interpretation step often leads tothe formulation of new research questions, which, in turn, leads back to the firststep In this way, good data analysis is often an iterative process

Example 1.3 illustrates the steps in the data analysis process

E x a m p l e 1 3 A Proposed New Treatment for Alzheimer’s Disease

The article “Brain Shunt Tested to Treat Alzheimer’s” (San Francisco Chronicle,

October 23, 2002) summarizes the findings of a study that appeared in the journal

Neurology Doctors at Stanford Medical Center were interested in determining

whether a new surgical approach to treating Alzheimer’s disease results in improvedmemory functioning The surgical procedure involves implanting a thin tube, called

a shunt, which is designed to drain toxins from the fluid-filled space that cushionsthe brain Eleven patients had shunts implanted and were followed for a year, receiv-ing quarterly tests of memory function Another sample of Alzheimer’s patients wasused as a comparison group Those in the comparison group received the standardcare for Alzheimer’s disease After analyzing the data from this study, the investiga-tors concluded that the “results suggested the treated patients essentially held theirown in the cognitive tests while the patients in the control group steadily declined.However, the study was too small to produce conclusive statistical evidence.” Based

on these results, a much larger 18-month study was planned That study was to clude 256 patients at 25 medical centers around the country

in-This study illustrates the nature of the data analysis process A clearly definedresearch question and an appropriate choice of how to measure the variable of

1.3 ■ Statistics and the Data Analysis Process 9

Trang 38

interest (the cognitive tests used to measure memory function) preceded the data lection Assuming that a reasonable method was used to collect the data (we will seehow this can be evaluated in Chapter 2) and that appropriate methods of analysiswere employed, the investigators reached the conclusion that the surgical procedureshowed promise However, they recognized the limitations of the study, especiallythose resulting from the small number of patients in the group that received surgicaltreatment, which in turn led to the design of a larger, more sophisticated study

col-As is often the case, the data analysis cycle led to further research, and the processbegan anew

Evaluating a Research Study The six data analysis steps can also be used as aguide for evaluating published research studies The following questions should be ad-dressed as part of a study evaluation:

■What were the researchers trying to learn? What questions motivated their research?

■Was relevant information collected? Were the right things measured?

■Were the data collected in a sensible way?

■Were the data summarized in an appropriate way?

■Was an appropriate method of analysis used, given the type of data and how thedata were collected?

■Are the conclusions drawn by the researchers supported by the data analysis?Example 1.4 illustrates how these questions can guide an evaluation of a research study

E x a m p l e 1 4 Spray Away the Flu

The newspaper article “Spray Away Flu” (Omaha World-Herald, June 8, 1998)

re-ported on a study of the effectiveness of a new flu vaccine that is administered bynasal spray rather than by injection The article states that the “researchers gave thespray to 1070 healthy children, 15 months to 6 years old, before the flu season twowinters ago One percent developed confirmed influenza, compared with 18 percent

of the 532 children who received a placebo And only one vaccinated child oped an ear infection after coming down with influenza Typically 30 percent to

devel-40 percent of children with influenza later develop an ear infection.” The researchersconcluded that the nasal flu vaccine was effective in reducing the incidence of fluand also in reducing the number of children with flu who subsequently develop earinfections

The researchers here were trying to find out whether the nasal flu vaccine waseffective in reducing the number of flu cases and in reducing the number of ear in-fections in children who did get the flu They recorded whether a child received thenasal vaccine or a placebo (A placebo is a treatment that is identical in appearance

to the treatment of interest but contains no active ingredients.) Whether or not thechild developed the flu and a subsequent ear infection was also noted These are ap-propriate determinations to make in order to answer the research question of interest

We typically cannot tell much about the data collection process from a newspaperarticle As we will see in Section 2.3, to fully evaluate this study, we would alsowant to know how the participating children were selected, how it was determined

Trang 39

E x e r c i s e s 1.1–1.9

1.1 Give a brief definition of the terms descriptive

statis-tics and inferential statisstatis-tics.

1.2 Give a brief definition of the terms population and

sample.

1.3 Data from a poll conducted by Travelocity led to the

following estimates: Approximately 40% of travelers

check work email while on vacation, about 33% take cell

phones on vacation in order to stay connected with work,

and about 25% bring a laptop computer on vacation

(San Luis Obispo Tribune, December 1, 2005) Are the

given percentages population values or were they

com-puted from a sample?

1.4 Based on a study of 2121 children between the ages

of one and four, researchers at the Medical College of

Wisconsin concluded that there was an association

be-tween iron deficiency and the length of time that a child is

bottle-fed (Milwaukee Journal Sentinel, November 26,

2005) Describe the sample and the population of interest

for this study

1.5 The student senate at a university with 15,000

stu-dents is interested in the proportion of stustu-dents who favor

a change in the grading system to allow for plus and

mi-nus grades (e.g., B, B, B, rather than just B) Two

hundred students are interviewed to determine their

atti-tude toward this proposed change What is the population

of interest? What group of students constitutes the sample

in this problem?

1.6 The supervisors of a rural county are interested in theproportion of property owners who support the construc-tion of a sewer system Because it is too costly to contactall 7000 property owners, a survey of 500 owners (se-lected at random) is undertaken Describe the populationand sample for this problem

1.7 ▼Representatives of the insurance industry wished toinvestigate the monetary loss resulting from earthquakedamage to single-family dwellings in Northridge, Califor-nia, in January 1994 From the set of all single-familyhomes in Northridge, 100 homes were selected for inspection Describe the population and sample for thisproblem

1.8 A consumer group conducts crash tests of new modelcars To determine the severity of damage to 2006 Mazda6s resulting from a 10-mph crash into a concrete wall, theresearch group tests six cars of this type and assesses theamount of damage Describe the population and samplefor this problem

1.9 A building contractor has a chance to buy an odd

lot of 5000 used bricks at an auction She is interested

in determining the proportion of bricks in the lot that arecracked and therefore unusable for her current project, butshe does not have enough time to inspect all 5000 bricks

Instead, she checks 100 bricks to determine whether each

is cracked Describe the population and sample for thisproblem

1.3 ■ Statistics and the Data Analysis Process 11

that a particular child received the vaccine or the placebo, and how the subsequentdiagnoses of flu and ear infection were made

We will also have to delay discussion of the data analysis and the ness of the conclusions because we do not yet have the necessary tools to evaluatethese aspects of the study

appropriate-■

Many other interesting examples of statistical studies can be found in Statistics: A

Guide to the Unknown and in Forty Studies That Changed Psychology: Exploration into the History of Psychological Research (the complete references for these two

books can be found in the back of the book)

Bold exercises answered in back ● Data set available online but not required ▼ Video solution available

Trang 40

Every discipline has its own particular way of using common words, and statistics is

no exception You will recognize some of the terminology from previous math and ence courses, but much of the language of statistics will be new to you

sci-■ Describing Data

The individuals or objects in any particular population typically possess many acteristics that might be studied Consider a group of students currently enrolled in astatistics course One characteristic of the students in the population is the brand ofcalculator owned (Casio, Hewlett-Packard, Sharp, Texas Instruments, and so on) An-other characteristic is the number of textbooks purchased that semester, and yet an-

char-other is the distance from the university to each student’s permanent residence A

vari-able is any characteristic whose value may change from one individual or object to

another For example, calculator brand is a variable, and so are number of textbooks

purchased and distance to the university Data result from making observations either

on a single variable or simultaneously on two or more variables

A univariate data set consists of observations on a single variable made on uals in a sample or population There are two types of univariate data sets: categorical

individ-and numerical In the previous example, calculator brindivid-and is a categorical variable,

be-cause each student’s response to the query, “What brand of calculator do you own?” is

a category The collection of responses from all these students forms a categorical data

set The other two attributes, number of textbooks purchased and distance to the

uni-versity, are both numerical in nature Determining the value of such a numerical

vari-able (by counting or measuring) for each student results in a numerical data set

E x a m p l e 1 5 Airline Safety Violations

The Federal Aviation Administration (FAA) monitors airlines and can take trative actions for safety violations Information about the fines assessed by the FAA

adminis-appeared in the article “Just How Safe Is That Jet?” (USA Today, March 13, 2000).

Violations that could lead to a fine were categorized as Security (S), Maintenance (M),Flight Operations (F), Hazardous Materials (H), or Other (O) Data for the variable

type of violation for 20 administrative actions are given in the following list (these

data are a subset of the data described in the article, but they are consistent withsummary values given in the paper; for a description of the full data set, see Exercise 1.24):

D E F I N I T I O N

A data set consisting of observations on a single attribute is a univariate

data set A univariate data set is categorical (or qualitative) if the

individ-ual observations are categorical responses A univariate data set is

numeri-cal (or quantitative) if each observation is a number.

Ngày đăng: 08/08/2018, 16:52

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN