1. Trang chủ
  2. » Luận Văn - Báo Cáo

Ebook Essential statistics - Exploring the world through data (2nd edition - Global edition): Part 1

324 180 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 324
Dung lượng 13,34 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

(BQ) Part 1 book Essential statistics - Exploring the world through data has contents: Introduction to data, picturing variation with graphs, numerical summaries of center and variation, regression analysis - Exploring associations between variables, modeling variation with probability, modeling random events - The normal and binomial models.

Trang 1

This is a special edition of an established title widely

used by colleges and universities throughout the world

Pearson published this exclusive edition for the benefit

of students outside the United States and Canada If you

purchased this book within the United States or Canada,

you should be aware that it has been imported without

the approval of the Publisher or Author

edITIon

For these Global editions, the editorial team at Pearson has

collaborated with educators across the world to address a wide range

of subjects and requirements, equipping students with the best possible

learning tools This Global edition preserves the cutting-edge approach

and pedagogy of the original, but also features alterations, customization,

and adaptation from the north American version.

Trang 2

1

www.mystatlab.com

Introductory Statistics Courses

Leverage the Power of StatCrunch

MyStatLab leverages the power of StatCrunch—powerful, web-based statistics software Integrated into MyStatLab, students can easily analyze data from their exercises and etext

In addition, access to the full online community allows users to take advantage

of a wide variety of resources and applications at www.statcrunch.com

Real-World Statistics

MyStatLab video resources help foster conceptual understanding StatTalk Videos, hosted by fun-loving statistician Andrew Vickers, demonstrate important sta-tistical concepts through interesting stories and real-life events This series of 24 videos

includes assignable questions built in MyStatLab and an instructor’s guide

Bring Statistics to Life

Virtually flip coins, roll dice, draw cards, and interact with animations on your mobile device with the extensive menu of experi-ments and applets in StatCrunch Offering

a number of ways to practice resampling procedures, such as permutation tests and bootstrap confidence intervals, StatCrunch

is a complete and modern solution

MyStatLab is the market-leading online resource for learning and teaching statistics.

Trang 3

This page intentionally left blank

Trang 4

West Valley College

Boston Columbus Indianapolis New York San Francisco Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

Trang 5

Editor in Chief: Deirdre Lynch

Senior Acquisitions Editor: Suzanna Bainbridge

Editorial Assistant: Justin Billing

Field Marketing Manager: Andrew Noble

Product Marketing Manager: Tiffany Bitzel

Marketing Assistant: Jennifer Myers

Program Team Lead: Karen Wernholm

Program Manager: Chere Bemelmans

Project Team Lead: Peter Silvia

Project Manager: Peggy McMahon

Senior Author Support/Technology Specialist: Joe Vetere

Manager, Rights Management, Higher Education: Gina M Cheselka

Media Producer: Aimee Thorne Acquisitions Editor, Global Edition: Sourabh Maheshwari Assistant Project Editor, Global Edition: Sulagna Dasgupta Media Production Manager, Global Edition: Vikram Kumar Senior Manufacturing Controller, Production, Global Edition: Trudy Kimber Associate Director of Design, USHE EMSS/HSC/EDU: Andrea Nix Program Design Lead: Beth Paquin

Design, Full-Service Project Management, Composition, and Illustration:

Cenveo ® Publisher Services Senior Project Manager, MyStatLab: Robert Carroll

QA Manager, Assessment Content: Marty Wright Procurement Manager: Carol Melville

Pearson Education Limited

Edinburgh Gate

Harlow

Essex CM20 2JE

England

and Associated Companies throughout the world

Visit us on the World Wide Web at:

www.pearsonglobaleditions.com

© Pearson Education Limited 2017

The rights of Robert Gould, Colleen Ryan, Rebecca Wong to be identified as the authors of this work have been asserted by them in accordance with the

Copyright, Designs and Patents Act 1988.

Authorized adaptation from the United States edition, entitled Essential Statistics, 2nd edition, ISBN 978-0-134-13440-6, by Robert Gould, Colleen Ryan, and

Rebecca Wong, published by Pearson Education © 2017.

All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic,

mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a license permitting restricted copying in the

United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC 1N 8TS.

All trademarks used herein are the property of their respective owners The use of any trademark in this text does not vest in the author or publisher any trademark

ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners.

Acknowledgments of third-party content appear in Appendix D, which constitutes an extension of this copyright page.

TI-84+C screenshots courtesy of Texas Instruments Data and screenshots from StatCrunch used by permission of StatCrunch Screenshots from Minitab courtesy

of Minitab Corporation Screenshot from SOCR used by the permission of the Statistics Online Computational Resource, UCLA XLSTAT screenshots courtesy of

Addinsoft, Inc Used with permission All Rights Reserved XLSTAT is a registered trademark of Addinsoft SARL.

MICROSOFT® AND WINDOWS® ARE REGISTERED TRADEMARKS OF THE MICROSOFT CORPORATION IN THE U.S.A AND OTHER COUNTRIES SCREEN SHOTS

AND ICONS REPRINTED WITH PERMISSION FROM THE MICROSOFT CORPORATION THIS BOOK IS NOT SPONSORED OR ENDORSED BY OR AFFILIATED WITH

THE MICROSOFT CORPORATION MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS MAKE NO REPRESENTATIONS ABOUT THE SUITABILITY OF THE

INFORMA-TION CONTAINED IN THE DOCUMENTS AND RELATED GRAPHICS PUBLISHED AS PART OF THE SERVICES FOR ANY PURPOSE ALL SUCH DOCUMENTS AND

RELATED GRAPHICS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS HEREBY DISCLAIM ALL

WARRANTIES AND CONDITIONS WITH REGARD TO THIS INFORMATION, INCLUDING ALL WARRANTIES AND CONDITIONS OF MERCHANTABILITY, WHETHER

EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT IN NO EVENT SHALL MICROSOFT AND/OR ITS

RESPECTIVE SUPPLIERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS

OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH

THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THE SERVICES THE DOCUMENTS AND RELATED GRAPHICS CONTAINED HEREIN COULD

INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN MICROSOFT

AND/OR ITS RESPECTIVE SUPPLIERS MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED HEREIN

AT ANY TIME PARTIAL SCREEN SHOTS MAY BE VIEWED IN FULL WITHIN THE SOFTWARE VERSION SPECIFIED.

ISBN 10: 1-292-16122-1

ISBN 13: 978-1-292-16122-8

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

10 9 8 7 6 5 4 3 2 1

Printed and bound by Vivar in Malaysia

Trang 6

To my parents and family, my friends, and my colleagues who are also friends Without their patience and support, this would not have been possible.

Trang 7

Robert L Gould (Ph.D., University of California, San Diego) is a leader in the

statistics education community He has served as chair of the American Statistical Association’s Committee on Teacher Enhancement, has served as chair of the ASA’s

Statistics Education Section, and served on a panel of co-authors for the Guidelines for

Assessment in Instruction on Statistics Education (GAISE) College Report While ing as the associate director of professional development for CAUSE (Consortium for the Advancement of Undergraduate Statistics Education), Rob worked closely with the American Mathematical Association of Two-Year Colleges (AMATYC) to provide trav-eling workshops and summer institutes in statistics For over ten years, he has served

serv-as Vice-Chair of Undergraduate Studies at the UCLA Department of Statistics, and he

is director of the UCLA Center for the Teaching of Statistics In 2012, Rob was elected Fellow of the American Statistical Association

In his free time, Rob plays the cello and is an ardent reader of fiction

Colleen N Ryan has taught statistics, chemistry, and physics to diverse community

college students for decades She taught at Oxnard College from 1975 to 2006, where she earned the Teacher of the Year Award Colleen currently teaches statistics part-time

at California Lutheran University She often designs her own lab activities Her passion

is to discover new ways to make statistical theory practical, easy to understand, and sometimes even fun

Colleen earned a B.A in physics from Wellesley College, an M.A.T in physics from Harvard University, and an M.A in chemistry from Wellesley College Her first exposure to statistics was with Frederick Mosteller at Harvard

In her spare time, Colleen sings, has been an avid skier, and enjoys time with her family

About the Authors

Robert Gould

Colleen Ryan

Rebecca K Wong has taught mathematics and statistics at West Valley College for

more than twenty years She enjoys designing activities to help students actively explore statistical concepts and encouraging students to apply those concepts to areas

of personal interest

Rebecca earned a B.A in mathematics and psychology from the University of California, Santa Barbara, an M.S.T in mathematics from Santa Clara University, and an Ed.D in Educational Leadership from San Francisco State University She has been recognized for outstanding teaching by the National Institute of Staff and Organizational Development and the California Mathematics Council of Community Colleges

When not teaching, Rebecca is an avid reader and enjoys hiking trails with friends

Rebecca K Wong

Trang 8

Contents

Introduction to Data 26

CASE STUDY Deadly Cell Phones? 27

1.1 What Are Data? 28 1.2 Classifying and Storing Data 30 1.3 Organizing Categorical Data 34 1.4 Collecting Data to Understand Causality 39

EXPLORING STATISTICS Collecting a Table of Different Kinds of Data 49

1

CHAPTER

2

CASE STUDY Student-to-Teacher Ratio at Colleges 61

2.1 Visualizing Variation in Numerical Data 62 2.2 Summarizing Important Features of a Numerical Distribution 67 2.3 Visualizing Variation in Categorical Variables 75

2.4 Summarizing Categorical Distributions 78 2.5 Interpreting Graphs 81

EXPLORING STATISTICS Personal Distance 85

3

CASE STUDY Living in a Risky World 107

3.1 Summaries for Symmetric Distributions 108

3.2 What’s Unusual? The Empirical Rule and z-Scores 118

3.3 Summaries for Skewed Distributions 123 3.4 Comparing Measures of Center 130 3.5 Using Boxplots for Displaying Summaries 135

EXPLORING STATISTICS Does Reaction Distance Depend on Gender? 142

4

between Variables 166

CASE STUDY Catching Meter Thieves 167

4.1 Visualizing Variability with a Scatterplot 168 4.2 Measuring Strength of Association with Correlation 172 4.3 Modeling Linear Trends 180

4.4 Evaluating the Linear Model 193

EXPLORING STATISTICS Guessing the Age of Famous People 201

Preface 11Index of Applications 21

Trang 9

Modeling Variation with Probability 228

CASE STUDY  SIDS or Murder? 229

5.1 What Is Randomness? 230 5.2 Finding Theoretical Probabilities 233 5.3 Associations in Categorical Variables 242 5.4 Finding Empirical Probabilities 252

EXPLORING STATISTICS Let’s Make a Deal: Stay or Switch? 257

CASE STUDY You Sometimes Get More Than You Pay For 273

6.1 Probability Distributions Are Models of Random Experiments 274 6.2 The Normal Model 279

6.3 The Binomial Model (optional) 292

EXPLORING STATISTICS ESP with Coin Flipping 307

7

CASE STUDY Spring Break Fever: Just What the Doctors Ordered? 325

7.1 Learning about the World through Surveys 326 7.2 Measuring the Quality of a Survey 332

7.3 The Central Limit Theorem for Sample Proportions 340 7.4 Estimating the Population Proportion with Confidence Intervals 347 7.5 Comparing Two Population Proportions with Confidence 354

EXPLORING STATISTICS Simple Random Sampling Prevents Bias 361

8

CASE STUDY Dodging the Question 379

8.1 The Essential Ingredients of Hypothesis Testing 380 8.2 Hypothesis Testing in Four Steps 387

8.3 Hypothesis Tests in Detail 396 8.4 Comparing Proportions from Two Populations 403

EXPLORING STATISTICS Identifying Flavors of Gum through Smell 411Inferring Population Means 428

CASE STUDY Epilepsy Drugs and Children 429

9.1 Sample Means of Random Samples 430 9.2 The Central Limit Theorem for Sample Means 434 9.3 Answering Questions about the Mean of a Population 441 9.4 Hypothesis Testing for Means 451

9.5 Comparing Two Population Means 457 9.6 Overview of Analyzing Means 472

EXPLORING STATISTICS Pulse Rates 476

9

CHAPTER

Trang 10

CONTENTS 9

10

Research 500

CASE STUDY Popping Better Popcorn 501

10.1 The Basic Ingredients for Testing with Categorical Variables 502 10.2 Chi-Square Tests for Associations between Categorical Variables 509 10.3 Reading Research Papers 518

EXPLORING STATISTICS Skittles 527Appendix A Tables 543

Appendix B Check Your Tech Answers 551Appendix C Answers to Odd-Numbered Exercises 553Appendix D Credits 575

Index 577

Trang 11

This page intentionally left blank

Trang 12

Preface

About This Text

The primary focus of this text is still, as in the first edition, data We live in a

data-driven economy and, more and more, in a data-centered culture We don’t choose

whether we interact with data; the choice is made for us by websites that track our

browsing patterns, membership cards that track our spending habits, cars that transmit

our driving patterns, and smart phones that record our most personal moments

The silver lining of what some have called the Data Deluge is that we all have access

to rich and valuable data relevant in many important fields: environment, civics, social

sci-ences, economics, health care, entertainment This text teaches students to learn from such

data and, we hope, to become cognizant of the role of the data that appear all around them

We want students to develop a data habit of mind in which, when faced with decisions,

claims, or just plain curiosity, they know to reach for an appropriate data set to answer

their questions More important, we want them to have the skills to access these data and

the understanding to analyze the data critically Clearly, we’ve come a long way from the

“mean median mode” days of rote calculation To survive in the modern economy requires

much more than knowing how to plug numbers into a formula Today’s students must

know which questions can be answered by applying which statistic, and how to get

tech-nology to compute these statistics from within complex data sets

What’s New in the Second Edition

The second edition remains true to the goals of the first edition: to provide students

with the tools they need to make sense of the world by teaching them to collect,

visualize, analyze, and interpret data With the help of several wise and careful readers

and class testers, we have fine-tuned the second edition to better achieve this vision In

some sections, we have rewritten explanations or added new ones In others, we have

more substantially reordered content

More precisely, in this new edition you will find

• Coverage of two-proportion confidence intervals in Chapters 7 and 8

• An increase of more than 150 homework exercises in this edition, with more

than 400 total new, revised, and updated exercises We’ve added larger data sets

to Chapters 2, 3, 4, and 9 We’ve also added exercises to Section 2.5 and more

Chapter Review exercises throughout

• New or updated examples in each chapter, with current topics such as views of

stem cell research (Chapter 7) and online classes (Chapter 10)

• A more careful and thorough integration of technology in many examples

• Two new case studies: Student-to-Teacher Ratios in Chapter 2 and Dodging the

Question in Chapter 8

• A more straightforward implementation of simulations to understand probability in

Chapter 5

• A more unified presentation of hypothesis testing in Chapter 8 that better joins

conceptual understanding with application

• A greater number of “Looking Back” and “Caution” marginal boxes to help direct

students’ studies

• Updated technology guides to match current hardware and software

Trang 13

ApproachOur text is concept-based, as opposed to method-based We teach useful statistical methods, but we emphasize that applying the method is secondary to understanding the concept.

In the real world, computers do most of the heavy lifting for statisticians We therefore adopt an approach that frees the instructor from having to teach tedious procedures and leaves more time for teaching deeper understanding of concepts

Accordingly, we present formulas as an aid to understanding the concepts, rather than

as the focus of study

We believe students need to learn how to

• Determine which statistical procedures are appropriate

• Instruct the software to carry out the procedures

• Interpret the output

We understand that students will probably see only one type of statistical software in class But we believe it is useful for students to compare output from several different sources, so in some examples we ask them to read output from two or more software packages

One of the authors (Rob Gould) served on a panel of co-authors for the first edition of the collegiate version of the American Statistical Association–endorsed

Guidelines for Assessment and Instruction in Statistics Education (GAISE) We firmly

believe in its main goals and have adopted them in the preparation of this book

• We emphasize understanding over rote performance of procedures

• We use real data whenever possible

• We encourage the use of technology both to develop conceptual understanding and

to analyze data

• We believe strongly that students learn by doing For this reason, the homework problems offer students both practice in basic procedures and challenges to build conceptual understanding

CoverageThe first few chapters of this book are concept-driven and cover exploratory data anal-ysis and inferential statistics—fundamental concepts that every introductory statistics student should learn The last part of the book builds on that strong conceptual foun-dation and is more methods-based It presents several popular statistical methods and more fully explores methods presented earlier

Our ordering of topics is guided by the process through which students should analyze data First, they explore and describe data, possibly deciding that graphics and numerical summaries provide sufficient insight Then they make generalizations (infer-ences) about the larger world

Chapters 1–4: Exploratory Data Analysis The first four chapters cover data collection and summary Chapter 1 introduces the important topic of data collection and com-pares and contrasts observational studies with controlled experiments This chapter also teaches students how to handle raw data so that the data can be uploaded to their statis-tical software Chapters 2 and 3 discuss graphical and numerical summaries of single variables based on samples We emphasize that the purpose is not just to produce a graph or a number but, instead, to explain what those graphs and numbers say about the world Chapter 4 introduces simple linear regression and presents it as a technique for providing graphical and numerical summaries of relationships between two numeri-cal variables

Trang 14

PREFACE 13

We feel strongly that introducing regression early in the text is beneficial in

build-ing student understandbuild-ing of the applicability of statistics to real-world scenarios

After completing the chapters covering data collection and summary, students have

acquired the skills and sophistication they need to describe two-variable associations

and to generate informal hypotheses Two-variable associations provide a rich context

for class discussion and allow the course to move from fabricated problems (because

one-variable analyses are relatively rare in the real world) to real problems that appear

frequently in everyday life

Chapters 5–8: Inference. These chapters teach the fundamental concepts of

statisti-cal inference The main idea is that our data mirror the real world, but imperfectly;

although our estimates are uncertain, under the right conditions we can quantify our

uncertainty Verifying that these conditions exist and understanding what happens if

they are not satisfied are important themes of these chapters

Chapters 9–10: Methods. Here we return to the themes covered earlier in the text and

present them in a new context by introducing additional statistical methods, such as

estimating population means and analyzing categorical variables We also provide (in

Section 10.3) guidance for reading scientific literature, to offer students the experience

of critically examining real scientific papers

Organization

Our preferred order of progressing through the text is reflected in the Contents, but

there are some alternative pathways as well

10-week Quarter. The first eight chapters provide a full, one-quarter course in

intro-ductory statistics If time remains, cover Sections 9.1 and 9.2 as well, so that students

can solidify their understanding of confidence intervals and hypothesis tests by

revisit-ing the topic with a new parameter

Proportions First. Ask two statisticians, and you will get three opinions on whether

it is best to teach means or proportions first We have come down on the side of

proportions for a variety of reasons Proportions are much easier to find in popular

news media (particularly around election time), so they can more readily be tied to

students’ everyday lives Also, the mathematics and statistical theory are simpler;

because there’s no need to provide a separate estimate for the population standard

deviation, inference is based on the Normal distribution, and no further

approxima-tions (that is, the t-distribution) are required Hence, we can quickly get to the heart

of the matter with fewer technical diversions

The basic problem here is how to quantify the uncertainty involved in estimating a

parameter and how to quantify the probability of making incorrect decisions when

pos-ing hypotheses We cover these ideas in detail in the context of proportions Students

can then more easily learn how these same concepts are applied in the new context of

means (and any other parameter they may need to estimate)

Means First. Conversely, many people feel that there is time for only one parameter

and that this parameter should be the mean For this alternative presentation, cover

Chapters 6, 7, and 9, in that order On this path, students learn about survey sampling

and the terminology of inference (population vs sample, parameter vs statistic) and

then tackle inference for the mean, including hypothesis testing

To minimize the coverage of proportions, you might choose to cover Chapter 6,

Section 7.1 (which treats the language and framework of statistical inference in detail),

and then Chapter 9 Chapters 7 and 8 develop the concepts of statistical inference more

slowly than Chapter 9, but essentially, Chapter 9 develops the same ideas in the context

of the mean

Trang 15

If you present Chapter 9 before Chapters 7 and 8, we recommend that you devote roughly twice as much time to Chapter 9 as you have devoted to previous chapters, because many challenging ideas are explored in this chapter If you have already cov-ered Chapters 7 and 8 thoroughly, Chapter 9 can be covered more quickly.

acces-Using technology is important because it enables students to handle real data, and real data sets are often large and messy The following features are designed to guide students

• TechTips outline steps for performing calculations using TI-84® (including TI-84 + C®) graphing calculators, Excel®, Minitab®, and StatCrunch® We do not want students to get stuck because they don’t know how to reproduce the results

we show in the text, so whenever a new method or procedure is introduced, an icon, Tech, refers students to the TechTips section at the end of the chapter Each set of TechTips contains at least one mini-example, so that students are not only learning to use the technology but also practicing data analysis and reinforcing ideas discussed in the text Most of the provided TI-84 steps apply to all TI-84 calculators, but some are unique to the TI-84 + C calculator

• Check Your Tech examples help students understand that statistical calculations

done by technology do not happen in a vacuum and assure them that they can get the same numerical values by hand Although we place a higher value on inter-preting results and verifying conditions required to apply statistical models, the numerical values are important, too

• All data sets used in the exposition and exercises are available on the companion

website at www.pearsonglobaleditions.com/gould

Guiding Students

• Each chapter opens with a Theme Beginners have difficulty seeing the forest for

the trees, so we use a theme to give an overview of the chapter content

• Each chapter begins by posing a real-world Case Study At the end of the chapter,

we show how techniques covered in the chapter helped solve the problem sented in the Case Study

• Margin Notes draw attention to details that enhance student learning and reading

Trang 16

PREFACE 15

Looking Back reminders refer students to earlier coverage of a topic.

Details clarify or expand on a concept.

KEY

POINT Key Points highlight essential concepts to draw special attention to them

Understanding these concepts is essential for progress

Snapshots break down key statistical concepts introduced in the chapter,

quickly summarizing each concept or procedure and indicating when and how

it should be used

• An abundance of worked-out examples model solutions to real-world problems

rel-evant to students’ lives Each example is tied to an end-of-chapter exercise so that

students can practice solving a similar problem and test their understanding Within the

exercise sets, the icon TRY indicates which problems are tied to worked-out examples

in that chapter, and the numbers of those examples are indicated

• The Chapter Review that concludes each chapter provides a list of important new

terms, student learning objectives, a summary of the concepts and methods

dis-cussed, and sources for data, articles, and graphics referred to in the chapter

Active Learning

• For each chapter we’ve included an activity, Exploring Statistics, that students

are intended to do in class as a group We have used these activities ourselves, and

we have found that they greatly increase student understanding and keep students

engaged in class

• All exercises are located at the end of the chapter Section Exercises are designed

to begin with a few basic problems that strengthen recall and assess basic

knowl-edge, followed by mid-level exercises that ask more complex, open-ended

ques-tions Chapter Review Exercises provide a comprehensive review of material

covered throughout the chapter

The exercises emphasize good statistical practice by requiring students to

verify conditions, make suitable use of graphics, find numerical values, and

interpret their findings in writing All exercises are paired so that students can check

their work on the odd-numbered exercise and then tackle the corresponding even-

numbered exercise The answers to all odd-numbered exercises appear in the back

of the text

Challenging exercises, identified with an asterisk (*), ask open-ended questions

and sometimes require students to perform a complete statistical analysis For

exercises marked with a , accompanying data sets are available in MyStatLab and

on the companion website

• Most chapters include select exercises marked with a within the exercise set,

to indicate that problem-solving help is available in the Guided Exercises

section If students need support while doing homework, they can turn to the

Guided Exercises to see a step-by-step approach to solving the problem

Acknowledgments

We are grateful for the attention and energy that a large number of people devoted to

making this a better book We extend our gratitude to Elaine Newman (Sonoma State

University) and Ann Cannon (Cornell College), who checked the accuracy of this

text and its many exercises Thanks also to David Chelton, our developmental editor,

to Carol Merrigan, who handled production, to Peggy McMahon, project manager,

and to Connie Day, our copyeditor Many thanks to John Norbutas for his technical

advice and help with the TechTips We thank Suzanna Bainbridge, our acquisitions

Trang 17

editor, Justin Billing, editorial assistant, and Deirdre Lynch, editor-in-chief, for signing

us up and sticking with us, and we are grateful to Dona Kenly and Erin Kelly for their market development efforts

We extend our sincere thanks for the suggestions and contributions made by the following reviewers of this edition:

Lloyd Best, Pacific Union College

Mario Borha, Loyola University of

Chicago

David Bosworth, Hutchinson Community

College

Beth Burns, Bowling Green State University

Jim Johnston, Concord University

Manuel Lopez, Cerritos College

Ralph Padgett Jr., University of California –

Mahbobeh Vezvaei, Kent State

Arun Agarwal, Grambling State University

Anne Albert, University of Findlay

Michael Allen, Glendale Community College

Eugene Allevato, Woodbury University

Dr Jerry Allison, Trident Technical College

Polly Amstutz, University of Nebraska

Patricia Anderson, Southern Adventist

Diana Asmus, Greenville Technical College

Kathy Autrey, Northwestern State

University of Louisiana

Wayne Barber, Chemeketa Community

College

Roxane Barrows, Hocking College

Jennifer Beineke, Western New England

K.B Boomer, Bucknell University

David Bosworth, Hutchinson Community

College

Diana Boyette, Seminole Community

College

Elizabeth Paulus Brown, Waukesha

County Technical College

Leslie Buck, Suffolk Community College R.B Campbell, University of Northern Iowa Stephanie Campbell, Mineral Area College Ann Cannon, Cornell College

Rao Chaganty, Old Dominion University Carolyn Chapel, Western Technical College Christine Cole, Moorpark College Linda Brant Collins, University of Chicago James A Condor, Manatee Community

Nancy Eschen, Florida Community

College at Jacksonville

Karen Estes, St Petersburg College Mariah Evans, University of Nevada, Reno Harshini Fernando, Purdue University

College, East Campus

Kim Gilbert, University of Georgia Stephen Gold, Cypress College Nick Gomersall, Luther College Mary Elizabeth Gore, Community College

Trang 18

ACKNOWLEDGMENTS 17

Albert Groccia, Valencia Community

College, Osceola Campus

David Gurney, Southeastern Louisiana

University

Chris Hakenkamp, University of

Maryland, College Park

Melodie Hallet, San Diego State University

Donnie Hallstone, Green River

Community College

Cecil Hallum, Sam Houston State University

Josephine Hamer, Western Connecticut

State University

Mark Harbison, Sacramento City College

Beverly J Hartter, Oklahoma Wesleyan

University

Laura Heath, Palm Beach State College

Greg Henderson, Hillsborough

Community College

Susan Herring, Sonoma State University

Carla Hill, Marist College

Michael Huber, Muhlenberg College

Kelly Jackson, Camden County College

Bridgette Jacob, Onondaga Community

College

Robert Jernigan, American University

Chun Jin, Central Connecticut State

Robert Keller, Loras College

Omar Keshk, Ohio State University

Raja Khoury, Collin County Community

College

Brianna Killian, Daytona State College

Yoon G Kim, Humboldt State University

Greg Knofczynski, Armstrong Atlantic

University

Jeffrey Kollath, Oregon State University

Erica Kwiatkowski-Egizio, Joliet Junior

Deann Leoni, Edmonds Community College

Lenore Lerer, Bergen Community College

Quan Li, Texas A&M University

Doug Mace, Kirtland Community College

Walter H Mackey, Owens Community

and Technical College

Mary Moyinhan, Cape Cod Community

Danya Smithers, Northeast State

Technical Community College

Larry Southard, Florida Gulf Coast

University

Dianna J Spence, North Georgia College

& State University

René Sporer, Diablo Valley College Jeganathan Sriskandarajah, Madison Area

University

Mahbobeh Vezvaei, Kent State University Joseph Villalobos, El Camino College Barbara Wainwright, Sailsbury University Henry Wakhungu, Indiana University Dottie Walton, Cuyahoga Community

College

Jen-ting Wang, SUNY, Oneonta Jane West, Trident Technical College Michelle White, Terra Community College Bonnie-Lou Wicklund, Mount Wachusett

Yan Zheng-Araujo, Springfield

Community Technical College

Cathleen Zucco-Teveloff, Rider

University

Mark A Zuiker, Minnesota State

University,

Trang 19

Acknowledgments for the Global Edition

Contributors

Vikas Arora, StatisticianKiran Paul, Statistician

Reviewers

Santhosh Kumar, Christ University

Kiran Paul, Statistician

Chirag Trivedi, R J Tibrewal Commerce College

Trang 20

My Stat Lab® Online Course for Essential Statistics: Exploring

the World through Data, Second Edition, Gould/Ryan/Wong

(access code required)

MyStatLab is available to accompany Pearson’s market-leading text offerings To give students a consistent tone, voice, and teaching method, each text’s flavor and approach is tightly integrated throughout the accompanying MyStatLab course, mak- ing learning the material as seamless as possible.

Technology Tutorial Videos and Study Cards

Technology Tutorials provide brief

video walkthroughs and step instructional study cards on common statistical procedures for Minitab®, Excel®, and the TI-83/84 graphing calculator

step-by-Data Cycle of Everyday

Things Videos

Data Cycle of Everyday Things Videos

demonstrate for students that data can

be, and is, a part of everyday life! Through

a series of fun and engaging episodes, students learn to collect, analyze, and apply data to answer any range of real-

world statistical questions

Chapter Review Videos

Chapter Review Videos walk students through

solving a selection of the more complex problems posed in each chapter, provid-ing a review of the chapter’s key concepts and methods and offering students support where they most need it

Resources for Success

Trang 21

Instructor’s Solutions Manual

Instructor’s Solutions Manual contains worked-out

solutions to all the text exercises

TestGen

TestGen ® (www.pearsoned.com/testgen)

Updated to more closely mirror the 2nd Edition book,

TestGen enables instructors to build, edit, print, and

administer tests using a computerized bank of questions developed to cover all the objectives of the text

Online Test Bank

Online Test Bank (download only) includes tests for

each chapter with questions aimed to reinforce the text’s learning objectives

PowerPoint® Lecture Slides

PowerPoint Lecture Slides, aligned with the text,

provide an overview of each chapter, stressing important definitions and offering additional examples Multiple-choice questions are included for class assessment

Student Resources

Additional resources for student success via MyStatLab

Study Cards for Statistics Software

Study Cards for Statistics Software This series

of study cards, available for Excel, Minitab, JMP®, SPSS®, R, StatCrunch, and the TI-84 graphing calculators, provides students with easy, step-by-step guides to the most common statistics software

PowerPoint Slides

PowerPoint Slides provide an overview of each

chapter, stressing important definitions and offering additional examples These slides are an excellent resource for both traditional and online students

Chapter Review Videos

Chapter Review Videos walk students through

solving a selection of the more complex problems posed in each chapter, thereby offering a review of the chapter’s key concepts and providing students support where they most need it

The Data Cycle of Everyday Things Videos

The Data Cycle of Everyday Things Videos

demonstrate for students that data collection and data analysis can be applied to answer questions about everyday life Through a series of fun and engaging episodes, students learn to collect, analyze, and apply data to answer any range of real-world statistical questions

Resources for Success

Trang 22

cats’ birth weights, 314

elephants’ birth weights, 314

eye color and sex, 266, 268

finger length, 530

gestation periods for animals, 92

heights and armspans, 88, 209, 210–211

sleep time of animals, 153

temperature and frog’s jumping

performance, 534 tree heights, 220

twins, 314

white blood cells, 312

women’s heights, 310–311, 312–314

BUSINESS AND ECONOMICS

benefits of having rich people, 266

film budgets and grosses, 220

Foreign Direct Investment, 368

Navy commissary prices, 488

Occupy Wall Street, 533

oil leaders, 531

pay rate in different currencies, 146

price change in wheat, 93–94

prices at Target and Whole Foods, 219–220

retail car sales, 95 salaries/wages, 204, 208, 209, 215, 220 shrinking middle class, 80

soda production, 172–173 stressed moms, 368–369 tax regime, 423 textbook prices, 91, 486, 487 turkey costs, 215

used car values, 192–193 wealth distribution in United States, 94 women CEOs, 416

CRIME AND CORRECTIONS

“boot camp” and prevention, 413, 414 burglary, 148

car thefts, 266 counseling and criminology, 419 death row, 154

DWI convictions, 315 homicide clearance, 315 jury duty, 264

magistrate’s court judges, 259 parental training and criminal behavior of children, 537

Perry Preschool attendance and arrests, 537

Scared Straight prevention program, 56 SIDS or murder?, 229–230

stolen bicycles, 296–297, 314 therapy and criminology, 419 violent crime, 147, 158

EDUCATION

ACT scores, 311 alumni donations, 536 Audio-visual aids and grades, 54

BA percentage, 153 bar exam pass rates, 66, 72, 153, 221 BAs and median income, 204 changing multiple-choice answers, 261 cheating, 414

choosing science for higher studies, 422 college admission rates, 314–315, 481 college dropout rate, 414

college enrollment, 451 college graduation, 315, 481 college professors’ salaries, 204 confidence in public schools, 367 course enrollment rates, 54 debt after graduation, 481 education and marital status, 238, 239–240, 241, 245, 511–512 education and widows, 249 employment after law school, 98 exam scores, 121–122, 154, 157, 217,

222, 223, 478, 490 favorite subject, 52 gender and education, 532 gender gap in universities, 212–213 gender of teachers, 366

GPAs, 203, 205, 482, 483, 484 grades, 261

height and test scores, 219 high school graduation rates, 369, 370–371, 373–374

hours of study, 156, 221 law school tuition, 92–93 literacy rate, 265, 421 marriage and college degree, 261 math scores, 109–110

multiple-choice tests, 93, 100, 262, 414, 423

number of years of formal education, 89 Oregon bar exam, 366, 373

parental education, 530 parental educational level, 145, 216–217 percentage of students married or parents, 268

preschool attendance, 369, 370–371, 373–374

preschool attendance and high school graduation rates, 532–533 proportion of seniors in student population, 365

pursuing economics, 367 random answering, 414–415 random assignment of professors, 259 salary and education, 208, 220 SAT scores, 91, 148, 184–185, 265, 310,

312, 313, 316–317, 317–318 school drop-out rates, 536 shoe sizes, 99, 203, 218–219 student heights, 480 student-to-teacher ratio, 61–62, 84 teacher satisfaction, 251

teachers’ pay and costs of education, 215 true/false tests, 422, 423

tuition and fees, 88, 205, 444, 455, 470–471

value of college education, 260–261, 262 working and student grades, 215–216

EMPLOYMENT

age discrimination, 415 career goals, 372 cleanliness drive, 367 commuting times, 88 corporate organization and gender, 264 day at spa, 52

eating out and jobs, 91, 100, 146, 156

21

Trang 23

work from home, 421

working and student grades, 215–216

ENTERTAINMENT

animated movies, 148–149

film budgets and grosses, 220

hours of television viewing, 97, 490

condo rental prices, 88

land value prediction, 203

real estate prices, 146, 150, 157, 208, 218

fat in sliced turkey and ham compared, 126

fresh juice vs bottled juice, 413

fruit juice, 364–365 grocery prices, 490, 491 hungry monkeys, 512–513 mercury in freshwater fish, 366, 417 number of alcoholic drinks per week,

147, 149 organic food, 487 pizza size, 448 popcorn, 501–502, 525 protein intake, 312 size of ice cream cones, 273, 306 soda consumption, 155

soda production, 172–173, 267 soft drink serving size, 482 sugar-free diet and arthritis, 418 sugar in fruits, 93

vegan diets, 366 weight of carrots, 481 weight of colas, 489 weight of hamburgers, 93, 482 weight of ice cream cones, 489 weight of oranges, 481 weight of tomatoes, 483

GAMES

blackjack tips, 222 brain games, 44–45, 521–522 color of cubes, 268

dealing cards, 247–248 drawing cards, 260 flipping coins, 250–251, 260, 263, 264,

309, 314, 315 gambling, 264 lotto, 413 rolling dice, 242, 261, 414 running speed, 98 spinning coins, 390–391, 397, 417, 418 strike rate of batsman, 148

throwing dice, 235–236, 253–254, 263,

264, 265, 276–277, 309

GENERAL INTEREST

accuracy of shooting, 364 ages of students, 261 anniversaries and days of the week, 260, 265

book widths, 182–183 Cambridge nobel laureates, 95 children’s ages, 146

dogs vs cats, 339, 343–344

eating and gender, 144 four and two wheelers, 268 gender and toys, 529 guitar chords, 483

hand folding, 262, 269 home and car ownership, 261 joint bank account, 314 Morse code, 346–347, 353, 364, 417 musuem visit, 535

number of pairs of shoes owned, 88, 150–151, 488

offices with pantry, 95–96 offices with reception areas, 144 pets, 92

pocket money, 146 printing times, 88 rating hotels, 208 risky activities, 107–108, 140–141 seesaw height, 208, 211

skyscrapers, 136–137, 153 sleeping in, 460–461 tossing thumbtacks, 309, 261 weight of trash, 207, 216

HEALTH

age and sleep, 204, 219 age and weight, 211, 219 antibiotic or placebo, 530 antibiotics, 369–370, 413, 414 antiretrovirals to prevent HIV, 533 arthritis, 418

Atkins diet, 484 autism and MMR vaccine, 537 bariatric surgery for diabetes, 533 birth lengths, 146, 148

birth weights, 146, 148, 479 blood pressure, 46, 214 blood sugar, 536 blood thinners, 57 body mass index, 87, 483 body temperatures, 483, 492 breast cancer, 55

calcium, 473–474 calcium and death rate, 535–536 caloric restriction, 512–513 cancer survival, 413 causes of death, 80–81 cell phones and cancer, 27–28, 47–48 coffee and stroke, 535

college athletes’ weights, 486 colored vegetables and stroke, 536–537 copper bracelets, 55

Crohn’s disease, 47, 359–360 dancers’ heights, 484 death row and head trauma, 58 deep vein thrombosis, 422 depression treatment, 54 diarrhea vaccine in Africa, 419 diet drug, 370

dieting, 468–469 drug for asthma, 536, 539 drug for platelets, 535 drug for rheumatoid arthritis, 536 early tonsillectomy for children, 54

Trang 24

heights and ages for children, 222

heights and weights, 176–177, 205,

217–218 heights of bedridden patients, 187–189

heights of children, 128–130

heights of fathers and sons, 154, 217

heights of females, 148, 157

heights of males, 157, 483, 488

heights of students and parents, 491

HIV-1 and HIV-2, 57

hormone replacement therapy, 97

hospital rooms, 534–535

human cloning, 368

hypothermia for babies, 419

ideal weight, 99

iron and death rate, 535

jet lag drug, 530–531

life expectancy, 212, 215

light exposure in mice, 55–56

low birth weights, 316

number of AIDS cases, 53, 58–59

obesity and marital status, 531

pregnancy, 148, 429–430, 475

prostate cancer, 56, 538

protein intake, 312

pulse rates, 437, 484, 485, 487, 493

quantity of water drunk, 100

removal of healthy appendixes, 534

scorpion antivenom, 57

SIDS, 229–230

sleep, 87, 88, 96, 204, 488, 490

sleep deprivation, 55 sleep medicine for shift workers, 422 steroids and height, 535

strength training, 53 stroke, 57

stroke survival rate, 413 sugar-free diet and arthritis, 418 systolic blood pressures, 485–486 tight control of blood sugar, 536 transfusions for bleeding in the stomach, 370

treatment for CLL, 419, 424 triglycerides, 485, 486 vaccinations for diarrhea, 538 vegan diets, 366

video games and body mass index, 208 vitamin C and allergies, 55

vitamin D and osteoporosis, 57 weight of employees, 478 weight loss, 55, 145, 204, 421, 531 weights of soccer players and academic decathlon team members compared, 146

weights of vegetarians, 483

LAW

ages of prime ministers, 145 gun control, 417, 421, 423 Oregon bar exam, 366, 373 magistrate’s court judges, 259 three-strikes law, 421–422

POLITICS

dodging the question, 379–380, 409–410 European Union membership, 369 favorable neighboring country, 371 military coups, 145

party and right direction, 530 political party, 267–268 primary elections of 2012, 372 socialism, 265

PSYCHOLOGY

boys’ weight perception, 490–491 brain games, 44–45

complexion, 96 confederates and compliance, 55, 533–534 depression treatment, 54

dreaming, 372, 415 extrasensory perception, 293, 298–300, 364–365, 414, 420, 422–423 financial incentive effectiveness, 418 happiness and traditional views, 263 happiness and wealth, 266

IQs, 148 obesity and marital status, 56 opinion about music, 269 poverty and IQ, 41–42 sleep walking, 416 smiling, 419

smiling and age, 532

TV violence, 506–507, 530, 531–532

SOCIAL ISSUES

adoptions, 95 age by year, 95 ages of brides and grooms, 488 belief in UFOs, 269

body piercings, 74–75 cell phone calls, 479 death row and head trauma, 58 drunk walking, 315

education and marital status, 238, 239–240, 241, 245, 511–512 education and widows, 249 gender and opinion on same-sex mar- riage, 504–505

gender gap in universities, 212–213 guns in homes, 422

happiness, 147, 149, 220, 489 ideal family, 155

Iraq casualties and hometown populations, 215

marital status in India, 94 life expectancy and TV, 215 marital status, 56

marriage and college degree, 261 marriage rates, 53

number of births and population, 54 number of children, 482

number of siblings, 52, 89, 204 obesity and marital status, 531, 539 Odd-Even Formula, 533

population and number of billionaires, 213

population density, 53, 151, 156 population in 2007, 53

population increase, 156 population prediction, 53 probation and gender, 56–57 proportion of people who are married, 421 school drop-out rates, 536

smiling and age, 420 spring break fever, 325–326, 360

SPORTS

annual sports, 421 athlete’s age and speed, 212 baseball players, 482 basketball free-throw shots, 267, 304 basketball team heights, 492 batting and bowling, 261 marathon size, 70–71, 134, 155 NCAA soccer players, 74 race finishing times, 189 surfing, 145–146, 156, 488 T-20 cricket match, 146 weights of backpacks, 488 weights of baseball and soccer players compared, 91

Trang 25

weights of college athletes, 486

wins and strike-outs for baseball

economics in East Germany, 367

European Union membership, 369

favorable neighboring country, 371

gender and opinion on same-sex

millionaires with master’s degree, 366

most important problem, 531

musician survey, 96

news survey, 305

opinion about music, 269

opinion about nurses, 269

opinions on global warming, 98–99, 417 party and right direction, 530

political party affiliation, 94–95 presidential elections, 344–345, 368 salary deduction, 417

sexual harassment, 331–332 stem cell research, 345, 356–358, 405–406

tax benefits, 363 taxes, 417 tourists by month, 537 underwater mortgages, 352–353 use of helmets, 414

using Facebook, 367–368 value of college education, 260–261,

262, 269 wording of polls, 421

TECHNOLOGY

age and the Internet, 327 cell phone use, 96, 479 e-readers, 465–467 Internet access, 265, 315 Internet advertising, 382–383 teens and the Internet, 244 text messages, 93, 214, 216 using Facebook, 367

TRANSPORTATION

age and traffic rules, 529 air fares, 207–208

crash-test results, 31 distance and time, 207–208 driver’s exam, 262, 265, 315–316 drivers aged 84–89, 315

driving accidents, 156–157 DWI convictions, 315 gas mileage of cars, 220 gas prices, 110–111, 116–117, 125 KMPL for highway and city, 214 meter thieves, 167, 200

pedestrian fatalities, 54 plane crashes, 417 right of way, 406–408 seat belt use, 35–37, 263, 269, 414, 415–416

speed driven, 99 speeding tickets, 88, 155 stolen bicycles, 296–297, 314 stolen car rates, 38

SUVs, 414 texting while driving, 315, 422 time and distance of flights, 212, 222 traffic cameras, 99

traffic lights, 267 turn signal use, 370 use of helmets, 414 used car age and mileage, 171–172, 192–193, 480

used car values, 192–193 waiting for the bus, 278–279

Trang 27

Introduction

to Data

1

Trang 28

In September 2002, Dr Christopher Newman, a resident of Maryland, sued Motorola,

Verizon, and other wireless carriers, accusing them of causing a cancerous brain tumor

behind his right ear As evidence, his lawyers cited a study by Dr Lennart Hardell

Hardell had studied a large number of people with brain tumors and had found that

a greater percentage of them used cell phones than of those who did not have brain

tumors (CNN 2002; Brody 2002)

Speculation that cell phones might cause brain cancer began as early as 1993, when

(as CNN reports) the interview show Larry King Live featured a man who claimed that

his wife died because of cancer caused by her heavy cell phone use However, more

recent studies have contradicted Hardell’s results, as well as earlier reports about the

health risks of heavy cell phone use

The judge in Dr Newman’s trial was asked to determine whether Hardell’s

study was compelling enough to support allowing the trial to proceed Part of this

Deadly Cell Phones?

CASE STUDY

of the deaths She organized these data graphically, and these graphs enabled her to see a very important pattern: A large percentage of deaths were due

to contagious disease, and many deaths could be prevented by improving sanitary conditions Within six months, Nightingale had reduced the death rate by half Eventually she convinced Parliament and military authorities to completely reorganize the medical care they provided Accordingly, she is credited with inventing modern hospital management

In modern times, we have equally important questions

to answer Do cell phones cause brain tumors? Are alcoholic drinks healthful in moderation? Which diet works best for losing weight? What percentage of the public is concerned about job security? Statistics—the science (and art!) of collecting and analyzing observations

to learn about ourselves, our surroundings, and our universe—helps answer questions such as these

Data are the building blocks of statistics This chapter introduces some of the basic types of data and explains how we collect them, store them, and organize them

These ideas and skills will provide a basic foundation for your study of the rest of the text

This text will teach you to examine data to

better understand the world around you If you know how to sift data to find patterns, can communicate the results clearly, and understand whether you can generalize your results to other groups

and contexts, you will be able to make better decisions,

offer more convincing arguments, and learn things you

did not know before Data are everywhere, and making

effective use of them is such a crucial task that one

prominent economist has proclaimed statistics one of

the most important professions of the decade (McKinsley

Quarterly 2009).

The use of statistics to make decisions and convince

others to take action is not new Some statisticians

date the current practice of statistics back to the

mid-nineteenth century One famous example occurred in

1854, when the British were fighting the Russians in the

brutal Crimean War A British newspaper had criticized

the military medical facilities, and a young but

well-connected nurse, Florence Nightingale, was appointed to

study the situation and, if possible to improve it

Nightingale carefully recorded the numbers of deaths,

the causes of the deaths, and the times and dates

THEME

Statistics is the science of data, so we must learn the types of data we will

encounter and the methods for collecting data The method used to collect data is very important because it determines what types of conclusions we can reach and, as you’ll learn in later chapters, what types of analyses we

can do By organizing the data we’ve collected, we can often spot patterns that are not otherwise obvious.

Trang 29

determination involved evaluating the method that Hardell used to collect data If you were the judge, how would you rule? You will learn the judge’s ruling at the end of the chapter You will also see how the methods used to collect data about important

cause-and-effect relationships—such as that which Dr Newman alleged to exist between cell phone use and brain cancer—can affect the conclusions we can draw

The study of statistics rests on two major concepts: variation and data Variation is the

more fundamental of these concepts To illustrate this idea, draw a circle on a piece of paper Now draw another one, and try to make it look just the same Now another Are all three exactly the same? We bet they’re not They might be slightly different sizes, for instance, or slightly different versions of round This is an example of variation

How can you reduce this variation? Maybe you can get a penny and outline the penny

Try this three times Does variation still appear? Probably it does, even if you need a magnifying glass to see, say, slight variations in the thickness of the penciled line

Data are observations that you or someone else records The drawings in

Figure 1.1 are data that record our attempts to draw three circles that look the same

Analyzing pictorial data such as these is not easy, so we often try to quantify such observations—that is, to turn them into numbers How would you measure whether these three circles are the same? Perhaps you would compare diameters or circumfer-ences, or somehow try to measure how and where these circles depart from being perfect circles Whatever technique you chose, these measurements could also be con-sidered data

Data are more than just numbers, though David Moore, a well-known statistician, defined data as “numbers in context.” By this he meant that data consist not only of the numbers we record, but also of the story behind the numbers For example,

10.00, 9.88, 9.81, 9.81, 9.75, 9.69, 9.5, 9.44, 9.31are just numbers But in fact these numbers represent “Weight in pounds of the ten heaviest babies in a sample of babies born in North Carolina in 2004.” Now these numbers have a context and have been elevated into data See how much more interest-ing data are than numbers?

SECTION 1.1

What Are Data?

 FIGURE 1.1 (a) Three circles

drawn by hand (b) Three circles

drawn using a coin It is clear that

the circles drawn by hand show

more variability than the circles

drawn with the aid of a coin.

(a)

(b)

Details

Data Are What Data Is

If you want to be “old school”

grammatically correct, then

the word data is plural So

we say “data are” and not

“data is.” The singular form is

datum However, this usage is

changing over time, and some

dictionaries now say that

data can be used as both a

singular and a plural noun.

Trang 30

1.1 WHAT ARE DATA? CHAPTER 1 29

These data were collected by the state of North Carolina in part to help researchers

understand the factors that contribute to low-weight and premature births If doctors

understand the causes of premature birth, they can work to prevent it—perhaps by

helping expectant mothers change their behavior, perhaps by medical intervention, and

perhaps by a combination of both

KEY

POINT Data are “numbers in context.”

In the last few years, our culture and economy have been inundated with data

The magazine The Economist has called this surge of data the “data deluge.” One

reason for the rising tide of data is the application of automated data collection

devices These range from automatic sensors that simply record everything they see

and store the data on a computer, to websites and smart phone apps that record every

transaction their users make Google, for example, saves every search you make

and combines this with data on which links you click in order to improve the way

it presents information (and also, of course, to determine which advertisements will

appear on your search results)

Thanks to small, portable sensors, you can now join the “Personal Data Movement.”

Members of this movement record data about their daily lives and analyze it in order to

improve their health, to run faster, or just to make keepsakes—a modern-day scrapbook

Maybe you or a friend uses a Nike Fuel Band to keep track of regular runs One of

the authors of this text carries a FitBit in his pocket to record his daily activity From

this he learned that on days he lectures, he typically takes 7600 steps, and on days

that he does not lecture, he typically only takes 4900 steps Some websites, such as

your.flowingdata.com, make use of Twitter to help users collect, organize, and

understand whatever personal data they choose to record

Of course, it is not only machines that collect data Humans still actively collect

data with the intent of better understanding some phenomenon or making a discovery

Marketers prepare focus groups and surveys to describe the market for a new product

Sports analysts collect data to help their teams’ coaches win games, or to help fantasy

football league players Scientists perform experiments to test theories and to measure

changes in the economy or the climate In this text you’ll learn about the many ways in

which data are used

The point is that we have reached a historical moment where almost everything

can be thought of as data And once you find a way of capturing data about something

in your world, you can organize, sort, visualize, and analyze those data to gain deeper

understanding about the world around you

Data analysis involves creating summaries of data and explaining what these summaries tell us about the real world.

KEY

POINT

What Is Data Analysis?

In this text you will study the science of data Most important, you will learn to

ana-lyze data What does this mean? You are analyzing data when you examine data of

some sort and explain what they tell us about the real world In order to do this, you

must first learn about the different types of data, how data are stored and structured,

and how they are summarized The process of summarizing data takes up a big part

of this text; indeed, we could argue that the entire text is about summarizing data,

either through creating a visualization of the data or distilling them down to a few

numbers that we hope capture their essence

Trang 31

When we work with data, they are grouped into a collection, which we call

either a data set or a sample The word sample is important, because it implies that

the data we see are just one part of a bigger picture This “bigger picture” is called a

population Think of a population as the Data Set of Everything—it is the data set that

contains all of the information about everyone or everything with respect to whatever variable we are studying Quite often, the population is really what we want to learn about, and we learn about it by studying the data in our sample However, many times

it is enough just to understand and describe the sample For example, you might collect data from students in your class simply because you want to know about the students

in your class, and not because you wish to use this information to learn about all

stu-dents at your school Sometimes, data sets are so large that they effectively are the

population, as you’ll soon see in the data reflecting births in North Carolina

Two Types of Variables

The variables you’ll find in your data set come in two basic types, which can selves be broken into smaller divisions, as we’ll discuss later

them-Numerical variables describe quantities of the objects of interest The values will

be numbers The weight of an infant is an example of a numerical variable

Categorical variables describe qualities of the objects of interest These

val-ues will be categories The sex of an infant is an example of a categorical variable

The possible values are the categories “male” and “female.” Eye color of an infant is another example; the categories might be brown, blue, black, and so on You can often

identify categorical variables because their values are usually words, phrases, or letters

(We say “usually” because we sometimes use numbers to represent a word or phrase

Stay tuned.)

The first step in understanding data is to understand the different types of data you will encounter As you’ve seen, data are numbers in context But that’s only part

of the story; data are also recorded observations Your photo from your vacation

to Carhenge in Nebraska is data (Figure 1.2) The ultraviolet images streaming from the Earth Observer Satellite system are data (Figure 1.3) These are just two examples of data that are not numbers Statisticians work hard to help us analyze complex data, such as images and sound files, just as easily as we study numbers

Most of the methods involve recoding the data into numbers For example, your photos can be digitized in a scanner, converted into a very large set of numbers, and then analyzed You might have a digital camera that gives you feedback about the quality of a photo you’ve taken If so, your camera is not only collecting data but also analyzing it!

Almost always, our data sets will consist of characteristics of people or things

(such as gender and weight) These characteristics are called variables Variables are

not “unknowns” like those you studied in algebra We call these characteristics ables because they have variability: The values of the variable can be different from person to person

vari-SECTION 1.2

Classifying and Storing Data

m FIGURE 1.2 A photo of

Carhenge, Nebraska.

m FIGURE 1.3 Satellites in NASA’s

Earth Observing Mission record

ultraviolet reflections and transmit

these data back to Earth Such

data are used to construct images

of our planet Earth Observer

(http://eos.gsfc.nasa.gov/).

KEY POINT Variables in statistics are different from variables in algebra In statistics, variables

record characteristics of people or things.

Details

More Grammar

We’re using the word sample

as a noun—it is an object,

a collection of data that we

study Later we’ll also use the

word sample as a verb—that

is, to describe an action For

example, we’ll sample ice cream

cones to measure their weight.

Details

Quantitative and Qualitative

Data

Some statisticians use the

word quantitative to refer to

numerical variables (think

“quantity”) and qualitative to

refer to categorical variables

(think “quality”) We prefer

numerical and categorical Both

sets of terms are commonly

used, and you should be

prepared to hear and see both.

Trang 32

1.2 CLASSIFyINg AND STORINg DATA CHAPTER 1 31

EXAMPLE 1 Crash-Test Results

The data in Table 1.1 are an excerpt from crash-test dummy studies in which cars

are crashed into a wall at 35 miles per hour Each row of the data set represents the

observed characteristics of a single car This is a small sample of the database, which

is available from the National Transportation Safety Administration The head injury

variable reflects the risk to the passengers’ heads The higher the number, the greater

doors a categorical variable,

because nearly all cars have either 2 doors or 4 doors, and for many people, the number

of doors designates a certain type of car (small or larger)

There’s nothing wrong with that.

Coding Categorical Data with Numbers

Sometimes categorical variables are “disguised” as numerical The smoke variable in

the North Carolina data set (Table 1.2) has numbers for its values (0 and 1), but in fact

those numbers simply indicate whether or not the mother smoked Mothers were asked,

“Did you smoke?” and if they answered “Yes,” the researchers coded this categorical

response with a 1 If they answered “No,” the response was coded with a 0 These

par-ticular numbers represent categories, not quantities Smoke is a categorical variable.

Coding is used to help both humans and computers understand what the values of

a variable represent For example, a human would understand that a “yes” under the

“Smoke” column would mean that the person was a smoker, but to the computer, “yes”

is just a string of symbols If instead we follow a convention where a 1 means “yes”

and a 0 means “no,” then a human understands that the 1’s represent smokers, and

a computer can easily add the values together to determine, for example, how many

smokers are in the sample

Sometimes, researchers code categorical variables with numerical values.

For each variable, state whether it is numerical or categorical

QUESTION

Their values are descriptive names The units of doors are,

quite simply, the number of doors The units of weight are

pounds The variables doors and weight are numerical because

their values are measured quantities The units for head injury

are unclear; head injury is measured using some scale that the

researchers developed

Trang 33

This approach for coding categorical variables is quite common and useful If a

categorical variable has only two categories, as do gender and smoke, then it is almost

always helpful to code the values with 0 and 1 To help readers know what a “1”

means, rename the variable with either one of its category names A “1” then means the person belongs to that category, and a 0 means the person belongs to the other cate-

gory For example, instead of calling a variable gender, we rename it female And then

if the baby is a boy we enter the code 0, and if it’s a girl we enter the code 1

Sometimes your computer does the coding for you without your needing to know

anything about it So even if you see the words female and male on your computer, the

computer has probably coded these with values of 0 and 1 (or vice versa)

Storing Your Data

The format in which you record and store your data is very important Computer grams will require particular formats, and by following a consistent convention, you can be confident that you’ll better remember the qualities of your own data set if you need to revisit it months or even years later Data are often stored in a spreadsheet-like format in which each row represents the object (or person) of interest Each column represents a variable In Table 1.2, each row represents a baby The column heads are

pro-variables: Weight, Female, and Smoke This format is sometimes referred to as the

stacked data format.

When you collect your own data, the stacked format is almost always the best way

to record and store your data One reason is that it allows you to easily record several different variables for each subject Another reason is that it is the format that most software packages will assume you are using for most analyses (The exceptions are TI-84 and Excel.)

Some technologies, such as the TI calculators, require, or at least accommodate,

data stored in a different format, called unstacked data Unstacked data tables are also

common in some books and media publications In this format, each column represents

a variable from a different group For example, one column could represent men’s heights, and another column could represent women’s heights The data set, then, is a

single variable (height) broken into two groups The groups are determined by a

cat-egorical variable Table 1.3 shows an example of unstacked data, and Figure 1.4 shows the same data in TI-84 input format

By way of contrast, Table 1.4 shows the same data in stacked format

The great disadvantage of the unstacked format is that it can store only two ables at a time: the variable of interest (for example, height), and a categorical variable that tells us which group the observation belongs in (for example, gender) However, most of the time, we record many variables for each observation For example, we record a baby’s weight, gender, and whether or not the mother smoked The stacked format enables us to display as many variables as we wish

vari-EXAMPLE 2 Personal Data Collection

Using a sensor worn around her wrist, Safaa recorded the amount of sleep she got on several nights She also recorded whether it was a weekend or a weeknight For the weekends, she recorded (in hours): 8.1, 8.3 For the weeknights she recorded 7.9, 6.5, 8.2, 7.0, 7.3

Details

Numerical Categories

Categories might be numbers

Sometimes, numerical

variables are coded as

categories, even though we

wish to use them as numbers

For example, number of

siblings might be coded as

“none,” “one,” “two,” “three,”

etc Although words are

used, this is really a numerical

variable since it is counting

something.

m TABLE 1.2 Data for newborns

with coded categorical variables.

Weight Female Smoke

m FIGURE 1.4 TI-84 data input

screen (unstacked data).

each column measures a characteristic of that observation For Safaa, the unit of

Trang 34

1.2 CLASSIFyINg AND STORINg DATA CHAPTER 1 33

observation was a night of sleep, and she measured two characteristics: time and

whether or not it was a weekend In stacked format, her data would look like this:

(Note that you might have coded the “Weekend” variable differently For example,

instead of entering “Yes” or “No,” you might have written either “Weekend” or

“Weeknight” in each row.)

In the unstacked format, the numerical observations appear in separate columns,

depending on the value of the categorical variable:

m TABLE 1.4 The same data as

in Table 1.3, shown here in stacked format.

Look at the Data Set!

The fact that different people use different formats to store data means that the first step

in any data investigation is to look at the data set In most real-life situations, stacked data are the more useful format, because this format permits you to work with several variables at the same time.

The context is the most important aspect of data, although it is frequently overlooked

Table 1.5 shows a few lines from the data set of births in 2004 in North Carolina

(Holcomb 2006)

To understand these data, we need to ask and try to answer some questions in

order to better understand the context: Who, or what, was observed? What variables

were measured? How were they measured? What are the units of measurement? Who

collected the data? How did they collect the data? Where were the data collected? Why

were the data collected? When were the data collected?

Many, but not all, of these questions can be answered for these data by reading

the information provided on the website that hosts the data Other times we are not so

lucky and must rely on very flimsy supporting documentation If you collect the data

yourself, you should be careful to record this extra supporting information Or, if you

get a chance to talk with the people who collected the data, then you should ask them

these questions

d Who, or what, was observed? In this data set, we observed babies Each line in the

table represents a newborn baby born in North Carolina in 2004 If we were to see the

whole table, we would see a record of every baby born in 2004 in North Carolina

Trang 35

d What variables were measured? For each baby, the state records the weight, the

gender, and whether the mother smoked

d How were the variables measured? Unknown Presumably, most measurements on

the baby were taken from a medical caregiver at the time of the birth, but we don’t know how or when information about the mother was collected

d What are the units of measurement? Units of measurement are important The

same variable can have different units of measurement For example, weight could

be measured in pounds, in ounces, or in kilograms For Table 1.5,Weight: reported in pounds

Gender: reported as M for boys and F for girls

Smoke: reported as a 1 if the mother smoked during the pregnancy, as a 0 if she did not

d Who collected the data? The government of the state of North Carolina.

d How did they collect the data? Data were recorded for all births that occurred in

hospitals in North Carolina Later in the chapter you’ll see that data can be lected by drawing a random sample of subjects, or by assigning subjects to receive different treatments, as well as through other methods The exact method used for Table 1.5 is not clear, but the data were probably compiled from publicly available medical records and from reports by the physicians and caregivers

col-d Where were the data collected? The location where the data were collected often

gives us information about who (or what) the study is about These data were lected in North Carolina and consist of babies born in that state We should there-fore be very wary about generalizing our findings to other states or other countries

col-d Why were the data collected? Sometimes, data are collected to learn about a larger

population At other times, the goals are limited to learning more about the sample itself In this case the data consist of all births in North Carolina, and it is most likely that researchers wanted to learn how the health of infants was related to the smoking habits of mothers within this sample

d When were the data collected? The world is always changing, and so conclusions

based on a data set from 1980 might be different from conclusions based on data collected for a similar study in 2015 These data were collected in 2004

KEY POINT The first time you see a data set, ask yourself these questions:

d Who, or what, was observed?

d What variables were measured?

d How were the variables measured?

d What are the units of measurement?

d Who collected the data?

d How did they collect the data?

d Where were the data collected?

d Why did they collect the data?

d When were the data collected?

Once we have a data set, we next need to organize and display the data in a way that helps us see patterns This task of organization and display is not easy, and we discuss

it throughout the entire text In this section we introduce the topic for the first time, in the context of categorical variables

SECTION 1.3

Organizing Categorical Data

Trang 36

1.3 ORgANIzINg CATEgORICAL DATA CHAPTER 1 35

With categorical variables, we are usually concerned with knowing how often a

par-ticular category occurs in our sample We then (usually) want to compare how often a

category occurs for one group with how often it occurs for another (liberal/conservative,

man/woman) To do these comparisons, you need to understand how to calculate

percentages and other rates

A common method for summarizing two potentially related categorical variables

is to use a two-way table Two-way tables show how many times each combination of

categories occurs For example, Table 1.6 is a two-way table from the Youth Behavior

Risk Survey that shows gender and whether or not the respondent always (or almost

always) wears a seat belt when riding in or driving a car The actual Youth Behavior

Risk Survey has over 10,000 respondents, but we are practicing on a small sample

from this much larger data set

The table tells us that 2 people were male and did not always wear a seat belt

Three people were female and did not always wear a seat belt These counts are also

called frequencies A frequency is simply the number of times a value is observed in a

data set

Some books and publications discuss two-way tables as if they displayed the

original data collected by the investigators However, two-way tables do not consist of

“raw” data but, rather, are summaries of data sets For example, the data set that

pro-duced Table 1.6 is shown in Table 1.7

To summarize this table, we simply count how many of the males (a 1 in the Male

column) also do not always wear seat belts (a 1 in the Not Always column) We then

count how many both are male and always wear seat belts (a 1 in the Male column, a

0 in the Not Always column); how many both are female and don’t always wear seat

belts (a 0 in the Male column, a 1 in the Not Always column); and finally, how many

both are female and always wear a seat belt (a 0 in the Male column, a 0 in the Not

Always column)

Example 3 illustrates that summarizing the data in a two-way table can make it

easy to compare groups

EXAMPLE 3 Percentages of Seat Belt Wearers

The 2011 Youth Behavior Risk Survey is a national study that asks American youths

about potentially risky behaviors We show the two-way summary again All of the

people in the table were between 14 and 17 years old The participants were asked

whether they wear a seat belt while driving or riding in a car The people who said

always or almost always were put in the Always group The people who said

some-times or rarely were put in the Not Always group

two-m TABLE 1.6 This two-way table shows counts for 15 youths who responded to a survey about wear- ing seat belts.

in red those who did not always wear a seat belt (the risk takers).

Male Not Always

a How many men are in this sample? How many women? How many people do not

always wear seat belts? How many always wear seat belts?

b What percent of the sample are men? What percent are women? What percent don’t

always wear seat belts? What percent always wear seat belts?

c Are the men in the sample more likely than the women in the sample to take the

risk of not wearing a seat belt?

Trang 37

a We can count the men by adding the first column: 2 + 3 = 5 men Adding the second column gives us the number of women: 3 + 7 = 10.

We get the number who do not always wear seat belts by adding the first row:

2 + 3 = 5 people don’t always wear seat belts Adding the second row gives us the number who always wear seat belts: 3 + 7 = 10

b This question asks us to convert the numbers we found in part (a) to percentages To

do this, we divide the numbers by 15, because there were 15 people in the sample

To convert to percentages, we multiply this proportion by 100%

The proportion of men is 5>15 = 0.333 The percentage is 0.333 * 100% = 33.3%

The proportion of women must be 100% - 33.3% = 66.7% (10>15 * 100% =66.7%)

The proportion who do not always wear seat belts is 5>15 = 0.333, or 33.3%

The proportion who always wear seat belts is 100% - 33.3% = 66.7%

c You might be tempted to answer this question by counting the number of males who don’t always wear seat belts (2 people) and comparing that to the number of females who don’t always wear seat belts (3 people) However, this is not a fair comparison because there are more females than males in the sample Instead, we should look at the percentage of those who don’t always wear seat belts in each group This ques-tion should be reworded as follows:

Is the percentage of males who don’t always wear seat belts greater than the centage of females who don’t always wear seat belts?

Because 2 out of 5 males don’t always wear seat belts, the percent of males who don’t always wear seat belts is (2>5) * 100% = 40%

Because 3 out of 10 females don’t always wear seat belts, the percent of females who don’t always wear seat belts is (3>10) * 100% = 30%

In fact, females in this sample engage in this risky behavior less often than males Among all U.S youth,

it is estimated that about 28% of males do not always wear their seat belt, compared to 23% of females

SOLUTIONS

The calculations in Example 3 took us from frequencies to percentages

Sometimes, we want to go in the other direction If you know the total number of people in a group, and are given the percentage that meets some qualification, you can

figure out how many people in the group meet that qualification.

EXAMPLE 4 Numbers of Seat Belt Wearers

A statistics class has 300 students, and they are asked whether they always ride or drive with a seat belt

Trang 38

1.3 ORgANIzINg CATEgORICAL DATA CHAPTER 1 37

a We need to find 30% of 300 When working with percentages, first convert the

per-centage to its decimal equivalent:

30% of 300 = 0.30 * 300 = 90Therefore, 90 students don’t always wear seat belts

b The question tells us that 20% of some unknown larger number (call it y) must be

equal to 43

0.20y = 43

Divide both sides by 0.20 and you get

y = 215There are 215 total students in the class, and 43 of them don’t always wear seat

belts

SOLUTIONS

Sometimes, you may come across data summaries that are missing crucial

infor-mation Suppose we wanted to know which team sports are the most dangerous to play

Table 1.8 shows the number of sports-related injuries that were treated in U.S

emer-gency rooms in 2009 (National Safety Council 2011) (Note that this table is not the

table of original data but is, instead, a summary of the original data.)

Wow! It’s a dangerous world out there Which would you conclude is the most

dangerous sport? Which is the least dangerous?

Did you answer that basketball was the most dangerous sport? It did have the most

injuries (501,251)—in fact, 50,000 more injuries than in football (451,961) Ice hockey

is known for its violence (you’ve heard the old joke, “I went to a fight and suddenly a

hockey match broke out”), but here, it seems to have caused relatively few injuries and

looks safe

The problem with comparing the numbers of injuries for these sports is that the

sports have different numbers of participants Injuries might be more common in

bas-ketball simply because more people play basbas-ketball Also, there might be relatively few

injuries in ice hockey merely because fewer people play One important component is

missing in Table 1.8, and the lack of this component makes our analysis impossible

Table 1.9 includes the component missing from Table 1.8: the number of

partici-pants in each sport We can’t directly compare the number of injuries from sport to

sport, because the numbers of members of the various groups are not the same This

improved table shows us the total membership of each group

m TABLE 1.8 Summary of counts

of sports injuries.

Sport Injuries

Baseball 165,842 Basketball 501,251 Bowling 20,878 Football 451,961 Ice hockey 19,035

Softball 121,175

Volleyball 60,159

b TABLE 1.9 Summary of counts

of sports injuries and numbers of participants.

Sport Participants Injuries

Trang 39

Which sport is the most dangerous? We now have the information we need to answer this question Specifically, we can find the percentage of participants injured

in each sport For example, what percent of basketball players were injured? There were 24,400,000 participants and 501,251 were injured, so the percent injured is (501,251>24,400,000) * 100% = 2.05%

Sometimes, with percentages as small as this, we understand the numbers more easily if we report not a percentage, but “number of events per 1000 objects” or maybe

even “per 10,000 objects.” We call such numbers rates To get the injury rate per 1000

people, instead of multiplying (501,251>24,400,000) by 100 we multiply by 1000:

(501,251>24,400,000) * 1000 = 20.54 injuries per 1000 people

These results are shown in Table 1.10

c TABLE 1.10 Summary of rates

of sports injuries. Sport Participants Injuries per Participant Rate of Injury Thousand Participants Rate of Injury per

We see now that football is the most dangerous sport: 50.78 players are injured out

of every 1000 players Basketball is less risky, with 20.54 injuries per 1000 players

EXAMPLE 5 Comparing Rates of Stolen Cars

Which model of car has the greatest risk of being stolen? The Highway Loss Data Institute reports that the Ford F-250 pickup truck is the most stolen car; 7 F-250’s are reported stolen out of every 1000 that are insured By way of contrast, the Jeep Compass is the least stolen; only 0.5 Jeep Compass is reported stolen for every 1000 insured (Insurance Institute for Highway Safety 2013)

the number of each type of car stolen?

than others Suppose there were many more Jeep Compasses than Ford F-250’s In that case, we might see a greater number of stolen

Jeeps, simply because there are more of them to steal By

looking at the theft rate, we adjust for the total number

of cars of that particular kind on the road

KEY POINT In order for us to compare groups, the groups need to be similar When the data

consist of counts, then percentages or rates are often better for comparisons because they take into account possible differences among the sizes of the groups.

Trang 40

1.4 COLLECTINg DATA TO UNDERSTAND CAUSALITy CHAPTER 1 39

Often, the most important questions in science, business, and everyday life are

ques-tions about causality These are usually phrased in the form of “what if” quesques-tions

What if I take this medicine; will I get better? What if I change my Facebook profile;

will my profile get more hits?

Questions about causality are often in the news The Los Angeles Times reported

that many people believe a drink called peanut milk can cure gum disease and slow the

onslaught of baldness The BBC News (2010) reported that “Happiness wards off heart

disease.” Statements such as these are everywhere we turn these days How do we

know whether to believe these claims?

The methods we use to collect data determine what types of conclusions we can

make Only one method of data collection is suitable for making conclusions about

causal relationships, but as you’ll see, that doesn’t stop people from making such

con-clusions anyway In this section we talk about three methods commonly used to collect

data in an effort to answer questions about causality: anecdotes, observational studies,

and controlled experiments

Most questions about causality can be understood in terms of two variables: the

treatment variable and the outcome variable (The outcome variable is also

some-times called the response variable, because it responds to changes in the treatment.)

We are essentially asking whether the treatment variable causes changes in the

out-come variable For example, the treatment variable might record whether or not a

person drinks Peanut Milk, and the outcome variable might record whether or not that

person’s gum disease improved Or the treatment variable might record whether or not

a person is generally happy, and the outcome variable might record whether or not that

person suffered from heart disease in a ten-year period

People who receive the treatment of interest (or have the characteristic of interest)

are said to be in the treatment group Those who do not receive that treatment (or

do not have that characteristic) are in the comparison group, which is also called the

control group.

Anecdotes

Peanut milk is a drink invented by Jack Chang, an entrepreneur in San Francisco,

California He noticed that after he drank peanut milk for a few months, he stopped

los-ing hair and his gum disease went away Accordlos-ing to the Los Angeles Times (Glionna

2006), another regular drinker of peanut milk says that the beverage caused his cancer

to go into remission Others have reported that drinking the beverage has reduced the

severity of their colds, has helped them sleep, and has helped them wake up

This is exciting stuff! Peanut milk could very well be something we should all be

drinking But can peanut milk really solve such a wide variety of problems? On the

face of it, it seems that there’s evidence that peanut milk has cured people of illness

The Los Angeles Times reports the names of people who claim that it has However,

the truth is that this is simply not enough evidence to justify any conclusion about

whether the beverage is helpful, harmful, or without any effect at all

These testimonials are examples of anecdotes An anecdote is essentially a

story that someone tells about her or his own (or a friend’s or relative’s) experience

Anecdotes are an important type of evidence in criminal justice because eyewitness

testimony can carry a great deal of weight in a criminal investigation However, for

answering questions about groups of people with great variability or diversity,

anec-dotes are essentially worthless

The primary reason why anecdotes are not useful for reaching conclusions about

cause-and-effect relationships is that the most interesting things that we study have so

SECTION 1.4

Collecting Data to Understand Causality

Ngày đăng: 03/02/2020, 20:21

TỪ KHÓA LIÊN QUAN

w