(BQ) Part 1 book Essential statistics - Exploring the world through data has contents: Introduction to data, picturing variation with graphs, numerical summaries of center and variation, regression analysis - Exploring associations between variables, modeling variation with probability, modeling random events - The normal and binomial models.
Trang 1This is a special edition of an established title widely
used by colleges and universities throughout the world
Pearson published this exclusive edition for the benefit
of students outside the United States and Canada If you
purchased this book within the United States or Canada,
you should be aware that it has been imported without
the approval of the Publisher or Author
edITIon
For these Global editions, the editorial team at Pearson has
collaborated with educators across the world to address a wide range
of subjects and requirements, equipping students with the best possible
learning tools This Global edition preserves the cutting-edge approach
and pedagogy of the original, but also features alterations, customization,
and adaptation from the north American version.
Trang 21
www.mystatlab.com
Introductory Statistics Courses
Leverage the Power of StatCrunch
MyStatLab leverages the power of StatCrunch—powerful, web-based statistics software Integrated into MyStatLab, students can easily analyze data from their exercises and etext
In addition, access to the full online community allows users to take advantage
of a wide variety of resources and applications at www.statcrunch.com
Real-World Statistics
MyStatLab video resources help foster conceptual understanding StatTalk Videos, hosted by fun-loving statistician Andrew Vickers, demonstrate important sta-tistical concepts through interesting stories and real-life events This series of 24 videos
includes assignable questions built in MyStatLab and an instructor’s guide
Bring Statistics to Life
Virtually flip coins, roll dice, draw cards, and interact with animations on your mobile device with the extensive menu of experi-ments and applets in StatCrunch Offering
a number of ways to practice resampling procedures, such as permutation tests and bootstrap confidence intervals, StatCrunch
is a complete and modern solution
MyStatLab is the market-leading online resource for learning and teaching statistics.
Trang 3This page intentionally left blank
Trang 4West Valley College
Boston Columbus Indianapolis New York San Francisco Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
Trang 5Editor in Chief: Deirdre Lynch
Senior Acquisitions Editor: Suzanna Bainbridge
Editorial Assistant: Justin Billing
Field Marketing Manager: Andrew Noble
Product Marketing Manager: Tiffany Bitzel
Marketing Assistant: Jennifer Myers
Program Team Lead: Karen Wernholm
Program Manager: Chere Bemelmans
Project Team Lead: Peter Silvia
Project Manager: Peggy McMahon
Senior Author Support/Technology Specialist: Joe Vetere
Manager, Rights Management, Higher Education: Gina M Cheselka
Media Producer: Aimee Thorne Acquisitions Editor, Global Edition: Sourabh Maheshwari Assistant Project Editor, Global Edition: Sulagna Dasgupta Media Production Manager, Global Edition: Vikram Kumar Senior Manufacturing Controller, Production, Global Edition: Trudy Kimber Associate Director of Design, USHE EMSS/HSC/EDU: Andrea Nix Program Design Lead: Beth Paquin
Design, Full-Service Project Management, Composition, and Illustration:
Cenveo ® Publisher Services Senior Project Manager, MyStatLab: Robert Carroll
QA Manager, Assessment Content: Marty Wright Procurement Manager: Carol Melville
Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England
and Associated Companies throughout the world
Visit us on the World Wide Web at:
www.pearsonglobaleditions.com
© Pearson Education Limited 2017
The rights of Robert Gould, Colleen Ryan, Rebecca Wong to be identified as the authors of this work have been asserted by them in accordance with the
Copyright, Designs and Patents Act 1988.
Authorized adaptation from the United States edition, entitled Essential Statistics, 2nd edition, ISBN 978-0-134-13440-6, by Robert Gould, Colleen Ryan, and
Rebecca Wong, published by Pearson Education © 2017.
All rights reserved No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic,
mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a license permitting restricted copying in the
United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6–10 Kirby Street, London EC 1N 8TS.
All trademarks used herein are the property of their respective owners The use of any trademark in this text does not vest in the author or publisher any trademark
ownership rights in such trademarks, nor does the use of such trademarks imply any affiliation with or endorsement of this book by such owners.
Acknowledgments of third-party content appear in Appendix D, which constitutes an extension of this copyright page.
TI-84+C screenshots courtesy of Texas Instruments Data and screenshots from StatCrunch used by permission of StatCrunch Screenshots from Minitab courtesy
of Minitab Corporation Screenshot from SOCR used by the permission of the Statistics Online Computational Resource, UCLA XLSTAT screenshots courtesy of
Addinsoft, Inc Used with permission All Rights Reserved XLSTAT is a registered trademark of Addinsoft SARL.
MICROSOFT® AND WINDOWS® ARE REGISTERED TRADEMARKS OF THE MICROSOFT CORPORATION IN THE U.S.A AND OTHER COUNTRIES SCREEN SHOTS
AND ICONS REPRINTED WITH PERMISSION FROM THE MICROSOFT CORPORATION THIS BOOK IS NOT SPONSORED OR ENDORSED BY OR AFFILIATED WITH
THE MICROSOFT CORPORATION MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS MAKE NO REPRESENTATIONS ABOUT THE SUITABILITY OF THE
INFORMA-TION CONTAINED IN THE DOCUMENTS AND RELATED GRAPHICS PUBLISHED AS PART OF THE SERVICES FOR ANY PURPOSE ALL SUCH DOCUMENTS AND
RELATED GRAPHICS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND MICROSOFT AND/OR ITS RESPECTIVE SUPPLIERS HEREBY DISCLAIM ALL
WARRANTIES AND CONDITIONS WITH REGARD TO THIS INFORMATION, INCLUDING ALL WARRANTIES AND CONDITIONS OF MERCHANTABILITY, WHETHER
EXPRESS, IMPLIED OR STATUTORY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT IN NO EVENT SHALL MICROSOFT AND/OR ITS
RESPECTIVE SUPPLIERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH
THE USE OR PERFORMANCE OF INFORMATION AVAILABLE FROM THE SERVICES THE DOCUMENTS AND RELATED GRAPHICS CONTAINED HEREIN COULD
INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN MICROSOFT
AND/OR ITS RESPECTIVE SUPPLIERS MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED HEREIN
AT ANY TIME PARTIAL SCREEN SHOTS MAY BE VIEWED IN FULL WITHIN THE SOFTWARE VERSION SPECIFIED.
ISBN 10: 1-292-16122-1
ISBN 13: 978-1-292-16122-8
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
10 9 8 7 6 5 4 3 2 1
Printed and bound by Vivar in Malaysia
Trang 6To my parents and family, my friends, and my colleagues who are also friends Without their patience and support, this would not have been possible.
Trang 7Robert L Gould (Ph.D., University of California, San Diego) is a leader in the
statistics education community He has served as chair of the American Statistical Association’s Committee on Teacher Enhancement, has served as chair of the ASA’s
Statistics Education Section, and served on a panel of co-authors for the Guidelines for
Assessment in Instruction on Statistics Education (GAISE) College Report While ing as the associate director of professional development for CAUSE (Consortium for the Advancement of Undergraduate Statistics Education), Rob worked closely with the American Mathematical Association of Two-Year Colleges (AMATYC) to provide trav-eling workshops and summer institutes in statistics For over ten years, he has served
serv-as Vice-Chair of Undergraduate Studies at the UCLA Department of Statistics, and he
is director of the UCLA Center for the Teaching of Statistics In 2012, Rob was elected Fellow of the American Statistical Association
In his free time, Rob plays the cello and is an ardent reader of fiction
Colleen N Ryan has taught statistics, chemistry, and physics to diverse community
college students for decades She taught at Oxnard College from 1975 to 2006, where she earned the Teacher of the Year Award Colleen currently teaches statistics part-time
at California Lutheran University She often designs her own lab activities Her passion
is to discover new ways to make statistical theory practical, easy to understand, and sometimes even fun
Colleen earned a B.A in physics from Wellesley College, an M.A.T in physics from Harvard University, and an M.A in chemistry from Wellesley College Her first exposure to statistics was with Frederick Mosteller at Harvard
In her spare time, Colleen sings, has been an avid skier, and enjoys time with her family
About the Authors
Robert Gould
Colleen Ryan
Rebecca K Wong has taught mathematics and statistics at West Valley College for
more than twenty years She enjoys designing activities to help students actively explore statistical concepts and encouraging students to apply those concepts to areas
of personal interest
Rebecca earned a B.A in mathematics and psychology from the University of California, Santa Barbara, an M.S.T in mathematics from Santa Clara University, and an Ed.D in Educational Leadership from San Francisco State University She has been recognized for outstanding teaching by the National Institute of Staff and Organizational Development and the California Mathematics Council of Community Colleges
When not teaching, Rebecca is an avid reader and enjoys hiking trails with friends
Rebecca K Wong
Trang 8Contents
Introduction to Data 26
CASE STUDY Deadly Cell Phones? 27
1.1 What Are Data? 28 1.2 Classifying and Storing Data 30 1.3 Organizing Categorical Data 34 1.4 Collecting Data to Understand Causality 39
EXPLORING STATISTICS Collecting a Table of Different Kinds of Data 49
1
CHAPTER
2
CASE STUDY Student-to-Teacher Ratio at Colleges 61
2.1 Visualizing Variation in Numerical Data 62 2.2 Summarizing Important Features of a Numerical Distribution 67 2.3 Visualizing Variation in Categorical Variables 75
2.4 Summarizing Categorical Distributions 78 2.5 Interpreting Graphs 81
EXPLORING STATISTICS Personal Distance 85
3
CASE STUDY Living in a Risky World 107
3.1 Summaries for Symmetric Distributions 108
3.2 What’s Unusual? The Empirical Rule and z-Scores 118
3.3 Summaries for Skewed Distributions 123 3.4 Comparing Measures of Center 130 3.5 Using Boxplots for Displaying Summaries 135
EXPLORING STATISTICS Does Reaction Distance Depend on Gender? 142
4
between Variables 166
CASE STUDY Catching Meter Thieves 167
4.1 Visualizing Variability with a Scatterplot 168 4.2 Measuring Strength of Association with Correlation 172 4.3 Modeling Linear Trends 180
4.4 Evaluating the Linear Model 193
EXPLORING STATISTICS Guessing the Age of Famous People 201
Preface 11Index of Applications 21
Trang 9Modeling Variation with Probability 228
CASE STUDY SIDS or Murder? 229
5.1 What Is Randomness? 230 5.2 Finding Theoretical Probabilities 233 5.3 Associations in Categorical Variables 242 5.4 Finding Empirical Probabilities 252
EXPLORING STATISTICS Let’s Make a Deal: Stay or Switch? 257
CASE STUDY You Sometimes Get More Than You Pay For 273
6.1 Probability Distributions Are Models of Random Experiments 274 6.2 The Normal Model 279
6.3 The Binomial Model (optional) 292
EXPLORING STATISTICS ESP with Coin Flipping 307
7
CASE STUDY Spring Break Fever: Just What the Doctors Ordered? 325
7.1 Learning about the World through Surveys 326 7.2 Measuring the Quality of a Survey 332
7.3 The Central Limit Theorem for Sample Proportions 340 7.4 Estimating the Population Proportion with Confidence Intervals 347 7.5 Comparing Two Population Proportions with Confidence 354
EXPLORING STATISTICS Simple Random Sampling Prevents Bias 361
8
CASE STUDY Dodging the Question 379
8.1 The Essential Ingredients of Hypothesis Testing 380 8.2 Hypothesis Testing in Four Steps 387
8.3 Hypothesis Tests in Detail 396 8.4 Comparing Proportions from Two Populations 403
EXPLORING STATISTICS Identifying Flavors of Gum through Smell 411Inferring Population Means 428
CASE STUDY Epilepsy Drugs and Children 429
9.1 Sample Means of Random Samples 430 9.2 The Central Limit Theorem for Sample Means 434 9.3 Answering Questions about the Mean of a Population 441 9.4 Hypothesis Testing for Means 451
9.5 Comparing Two Population Means 457 9.6 Overview of Analyzing Means 472
EXPLORING STATISTICS Pulse Rates 476
9
CHAPTER
Trang 10CONTENTS 9
10
Research 500
CASE STUDY Popping Better Popcorn 501
10.1 The Basic Ingredients for Testing with Categorical Variables 502 10.2 Chi-Square Tests for Associations between Categorical Variables 509 10.3 Reading Research Papers 518
EXPLORING STATISTICS Skittles 527Appendix A Tables 543
Appendix B Check Your Tech Answers 551Appendix C Answers to Odd-Numbered Exercises 553Appendix D Credits 575
Index 577
Trang 11This page intentionally left blank
Trang 12Preface
About This Text
The primary focus of this text is still, as in the first edition, data We live in a
data-driven economy and, more and more, in a data-centered culture We don’t choose
whether we interact with data; the choice is made for us by websites that track our
browsing patterns, membership cards that track our spending habits, cars that transmit
our driving patterns, and smart phones that record our most personal moments
The silver lining of what some have called the Data Deluge is that we all have access
to rich and valuable data relevant in many important fields: environment, civics, social
sci-ences, economics, health care, entertainment This text teaches students to learn from such
data and, we hope, to become cognizant of the role of the data that appear all around them
We want students to develop a data habit of mind in which, when faced with decisions,
claims, or just plain curiosity, they know to reach for an appropriate data set to answer
their questions More important, we want them to have the skills to access these data and
the understanding to analyze the data critically Clearly, we’ve come a long way from the
“mean median mode” days of rote calculation To survive in the modern economy requires
much more than knowing how to plug numbers into a formula Today’s students must
know which questions can be answered by applying which statistic, and how to get
tech-nology to compute these statistics from within complex data sets
What’s New in the Second Edition
The second edition remains true to the goals of the first edition: to provide students
with the tools they need to make sense of the world by teaching them to collect,
visualize, analyze, and interpret data With the help of several wise and careful readers
and class testers, we have fine-tuned the second edition to better achieve this vision In
some sections, we have rewritten explanations or added new ones In others, we have
more substantially reordered content
More precisely, in this new edition you will find
• Coverage of two-proportion confidence intervals in Chapters 7 and 8
• An increase of more than 150 homework exercises in this edition, with more
than 400 total new, revised, and updated exercises We’ve added larger data sets
to Chapters 2, 3, 4, and 9 We’ve also added exercises to Section 2.5 and more
Chapter Review exercises throughout
• New or updated examples in each chapter, with current topics such as views of
stem cell research (Chapter 7) and online classes (Chapter 10)
• A more careful and thorough integration of technology in many examples
• Two new case studies: Student-to-Teacher Ratios in Chapter 2 and Dodging the
Question in Chapter 8
• A more straightforward implementation of simulations to understand probability in
Chapter 5
• A more unified presentation of hypothesis testing in Chapter 8 that better joins
conceptual understanding with application
• A greater number of “Looking Back” and “Caution” marginal boxes to help direct
students’ studies
• Updated technology guides to match current hardware and software
Trang 13ApproachOur text is concept-based, as opposed to method-based We teach useful statistical methods, but we emphasize that applying the method is secondary to understanding the concept.
In the real world, computers do most of the heavy lifting for statisticians We therefore adopt an approach that frees the instructor from having to teach tedious procedures and leaves more time for teaching deeper understanding of concepts
Accordingly, we present formulas as an aid to understanding the concepts, rather than
as the focus of study
We believe students need to learn how to
• Determine which statistical procedures are appropriate
• Instruct the software to carry out the procedures
• Interpret the output
We understand that students will probably see only one type of statistical software in class But we believe it is useful for students to compare output from several different sources, so in some examples we ask them to read output from two or more software packages
One of the authors (Rob Gould) served on a panel of co-authors for the first edition of the collegiate version of the American Statistical Association–endorsed
Guidelines for Assessment and Instruction in Statistics Education (GAISE) We firmly
believe in its main goals and have adopted them in the preparation of this book
• We emphasize understanding over rote performance of procedures
• We use real data whenever possible
• We encourage the use of technology both to develop conceptual understanding and
to analyze data
• We believe strongly that students learn by doing For this reason, the homework problems offer students both practice in basic procedures and challenges to build conceptual understanding
CoverageThe first few chapters of this book are concept-driven and cover exploratory data anal-ysis and inferential statistics—fundamental concepts that every introductory statistics student should learn The last part of the book builds on that strong conceptual foun-dation and is more methods-based It presents several popular statistical methods and more fully explores methods presented earlier
Our ordering of topics is guided by the process through which students should analyze data First, they explore and describe data, possibly deciding that graphics and numerical summaries provide sufficient insight Then they make generalizations (infer-ences) about the larger world
Chapters 1–4: Exploratory Data Analysis The first four chapters cover data collection and summary Chapter 1 introduces the important topic of data collection and com-pares and contrasts observational studies with controlled experiments This chapter also teaches students how to handle raw data so that the data can be uploaded to their statis-tical software Chapters 2 and 3 discuss graphical and numerical summaries of single variables based on samples We emphasize that the purpose is not just to produce a graph or a number but, instead, to explain what those graphs and numbers say about the world Chapter 4 introduces simple linear regression and presents it as a technique for providing graphical and numerical summaries of relationships between two numeri-cal variables
Trang 14PREFACE 13
We feel strongly that introducing regression early in the text is beneficial in
build-ing student understandbuild-ing of the applicability of statistics to real-world scenarios
After completing the chapters covering data collection and summary, students have
acquired the skills and sophistication they need to describe two-variable associations
and to generate informal hypotheses Two-variable associations provide a rich context
for class discussion and allow the course to move from fabricated problems (because
one-variable analyses are relatively rare in the real world) to real problems that appear
frequently in everyday life
Chapters 5–8: Inference. These chapters teach the fundamental concepts of
statisti-cal inference The main idea is that our data mirror the real world, but imperfectly;
although our estimates are uncertain, under the right conditions we can quantify our
uncertainty Verifying that these conditions exist and understanding what happens if
they are not satisfied are important themes of these chapters
Chapters 9–10: Methods. Here we return to the themes covered earlier in the text and
present them in a new context by introducing additional statistical methods, such as
estimating population means and analyzing categorical variables We also provide (in
Section 10.3) guidance for reading scientific literature, to offer students the experience
of critically examining real scientific papers
Organization
Our preferred order of progressing through the text is reflected in the Contents, but
there are some alternative pathways as well
10-week Quarter. The first eight chapters provide a full, one-quarter course in
intro-ductory statistics If time remains, cover Sections 9.1 and 9.2 as well, so that students
can solidify their understanding of confidence intervals and hypothesis tests by
revisit-ing the topic with a new parameter
Proportions First. Ask two statisticians, and you will get three opinions on whether
it is best to teach means or proportions first We have come down on the side of
proportions for a variety of reasons Proportions are much easier to find in popular
news media (particularly around election time), so they can more readily be tied to
students’ everyday lives Also, the mathematics and statistical theory are simpler;
because there’s no need to provide a separate estimate for the population standard
deviation, inference is based on the Normal distribution, and no further
approxima-tions (that is, the t-distribution) are required Hence, we can quickly get to the heart
of the matter with fewer technical diversions
The basic problem here is how to quantify the uncertainty involved in estimating a
parameter and how to quantify the probability of making incorrect decisions when
pos-ing hypotheses We cover these ideas in detail in the context of proportions Students
can then more easily learn how these same concepts are applied in the new context of
means (and any other parameter they may need to estimate)
Means First. Conversely, many people feel that there is time for only one parameter
and that this parameter should be the mean For this alternative presentation, cover
Chapters 6, 7, and 9, in that order On this path, students learn about survey sampling
and the terminology of inference (population vs sample, parameter vs statistic) and
then tackle inference for the mean, including hypothesis testing
To minimize the coverage of proportions, you might choose to cover Chapter 6,
Section 7.1 (which treats the language and framework of statistical inference in detail),
and then Chapter 9 Chapters 7 and 8 develop the concepts of statistical inference more
slowly than Chapter 9, but essentially, Chapter 9 develops the same ideas in the context
of the mean
Trang 15If you present Chapter 9 before Chapters 7 and 8, we recommend that you devote roughly twice as much time to Chapter 9 as you have devoted to previous chapters, because many challenging ideas are explored in this chapter If you have already cov-ered Chapters 7 and 8 thoroughly, Chapter 9 can be covered more quickly.
acces-Using technology is important because it enables students to handle real data, and real data sets are often large and messy The following features are designed to guide students
• TechTips outline steps for performing calculations using TI-84® (including TI-84 + C®) graphing calculators, Excel®, Minitab®, and StatCrunch® We do not want students to get stuck because they don’t know how to reproduce the results
we show in the text, so whenever a new method or procedure is introduced, an icon, Tech, refers students to the TechTips section at the end of the chapter Each set of TechTips contains at least one mini-example, so that students are not only learning to use the technology but also practicing data analysis and reinforcing ideas discussed in the text Most of the provided TI-84 steps apply to all TI-84 calculators, but some are unique to the TI-84 + C calculator
• Check Your Tech examples help students understand that statistical calculations
done by technology do not happen in a vacuum and assure them that they can get the same numerical values by hand Although we place a higher value on inter-preting results and verifying conditions required to apply statistical models, the numerical values are important, too
• All data sets used in the exposition and exercises are available on the companion
website at www.pearsonglobaleditions.com/gould
Guiding Students
• Each chapter opens with a Theme Beginners have difficulty seeing the forest for
the trees, so we use a theme to give an overview of the chapter content
• Each chapter begins by posing a real-world Case Study At the end of the chapter,
we show how techniques covered in the chapter helped solve the problem sented in the Case Study
• Margin Notes draw attention to details that enhance student learning and reading
Trang 16PREFACE 15
Looking Back reminders refer students to earlier coverage of a topic.
Details clarify or expand on a concept.
• KEY
POINT Key Points highlight essential concepts to draw special attention to them
Understanding these concepts is essential for progress
• Snapshots break down key statistical concepts introduced in the chapter,
quickly summarizing each concept or procedure and indicating when and how
it should be used
• An abundance of worked-out examples model solutions to real-world problems
rel-evant to students’ lives Each example is tied to an end-of-chapter exercise so that
students can practice solving a similar problem and test their understanding Within the
exercise sets, the icon TRY indicates which problems are tied to worked-out examples
in that chapter, and the numbers of those examples are indicated
• The Chapter Review that concludes each chapter provides a list of important new
terms, student learning objectives, a summary of the concepts and methods
dis-cussed, and sources for data, articles, and graphics referred to in the chapter
Active Learning
• For each chapter we’ve included an activity, Exploring Statistics, that students
are intended to do in class as a group We have used these activities ourselves, and
we have found that they greatly increase student understanding and keep students
engaged in class
• All exercises are located at the end of the chapter Section Exercises are designed
to begin with a few basic problems that strengthen recall and assess basic
knowl-edge, followed by mid-level exercises that ask more complex, open-ended
ques-tions Chapter Review Exercises provide a comprehensive review of material
covered throughout the chapter
The exercises emphasize good statistical practice by requiring students to
verify conditions, make suitable use of graphics, find numerical values, and
interpret their findings in writing All exercises are paired so that students can check
their work on the odd-numbered exercise and then tackle the corresponding even-
numbered exercise The answers to all odd-numbered exercises appear in the back
of the text
Challenging exercises, identified with an asterisk (*), ask open-ended questions
and sometimes require students to perform a complete statistical analysis For
exercises marked with a , accompanying data sets are available in MyStatLab and
on the companion website
• Most chapters include select exercises marked with a within the exercise set,
to indicate that problem-solving help is available in the Guided Exercises
section If students need support while doing homework, they can turn to the
Guided Exercises to see a step-by-step approach to solving the problem
Acknowledgments
We are grateful for the attention and energy that a large number of people devoted to
making this a better book We extend our gratitude to Elaine Newman (Sonoma State
University) and Ann Cannon (Cornell College), who checked the accuracy of this
text and its many exercises Thanks also to David Chelton, our developmental editor,
to Carol Merrigan, who handled production, to Peggy McMahon, project manager,
and to Connie Day, our copyeditor Many thanks to John Norbutas for his technical
advice and help with the TechTips We thank Suzanna Bainbridge, our acquisitions
Trang 17editor, Justin Billing, editorial assistant, and Deirdre Lynch, editor-in-chief, for signing
us up and sticking with us, and we are grateful to Dona Kenly and Erin Kelly for their market development efforts
We extend our sincere thanks for the suggestions and contributions made by the following reviewers of this edition:
Lloyd Best, Pacific Union College
Mario Borha, Loyola University of
Chicago
David Bosworth, Hutchinson Community
College
Beth Burns, Bowling Green State University
Jim Johnston, Concord University
Manuel Lopez, Cerritos College
Ralph Padgett Jr., University of California –
Mahbobeh Vezvaei, Kent State
Arun Agarwal, Grambling State University
Anne Albert, University of Findlay
Michael Allen, Glendale Community College
Eugene Allevato, Woodbury University
Dr Jerry Allison, Trident Technical College
Polly Amstutz, University of Nebraska
Patricia Anderson, Southern Adventist
Diana Asmus, Greenville Technical College
Kathy Autrey, Northwestern State
University of Louisiana
Wayne Barber, Chemeketa Community
College
Roxane Barrows, Hocking College
Jennifer Beineke, Western New England
K.B Boomer, Bucknell University
David Bosworth, Hutchinson Community
College
Diana Boyette, Seminole Community
College
Elizabeth Paulus Brown, Waukesha
County Technical College
Leslie Buck, Suffolk Community College R.B Campbell, University of Northern Iowa Stephanie Campbell, Mineral Area College Ann Cannon, Cornell College
Rao Chaganty, Old Dominion University Carolyn Chapel, Western Technical College Christine Cole, Moorpark College Linda Brant Collins, University of Chicago James A Condor, Manatee Community
Nancy Eschen, Florida Community
College at Jacksonville
Karen Estes, St Petersburg College Mariah Evans, University of Nevada, Reno Harshini Fernando, Purdue University
College, East Campus
Kim Gilbert, University of Georgia Stephen Gold, Cypress College Nick Gomersall, Luther College Mary Elizabeth Gore, Community College
Trang 18ACKNOWLEDGMENTS 17
Albert Groccia, Valencia Community
College, Osceola Campus
David Gurney, Southeastern Louisiana
University
Chris Hakenkamp, University of
Maryland, College Park
Melodie Hallet, San Diego State University
Donnie Hallstone, Green River
Community College
Cecil Hallum, Sam Houston State University
Josephine Hamer, Western Connecticut
State University
Mark Harbison, Sacramento City College
Beverly J Hartter, Oklahoma Wesleyan
University
Laura Heath, Palm Beach State College
Greg Henderson, Hillsborough
Community College
Susan Herring, Sonoma State University
Carla Hill, Marist College
Michael Huber, Muhlenberg College
Kelly Jackson, Camden County College
Bridgette Jacob, Onondaga Community
College
Robert Jernigan, American University
Chun Jin, Central Connecticut State
Robert Keller, Loras College
Omar Keshk, Ohio State University
Raja Khoury, Collin County Community
College
Brianna Killian, Daytona State College
Yoon G Kim, Humboldt State University
Greg Knofczynski, Armstrong Atlantic
University
Jeffrey Kollath, Oregon State University
Erica Kwiatkowski-Egizio, Joliet Junior
Deann Leoni, Edmonds Community College
Lenore Lerer, Bergen Community College
Quan Li, Texas A&M University
Doug Mace, Kirtland Community College
Walter H Mackey, Owens Community
and Technical College
Mary Moyinhan, Cape Cod Community
Danya Smithers, Northeast State
Technical Community College
Larry Southard, Florida Gulf Coast
University
Dianna J Spence, North Georgia College
& State University
René Sporer, Diablo Valley College Jeganathan Sriskandarajah, Madison Area
University
Mahbobeh Vezvaei, Kent State University Joseph Villalobos, El Camino College Barbara Wainwright, Sailsbury University Henry Wakhungu, Indiana University Dottie Walton, Cuyahoga Community
College
Jen-ting Wang, SUNY, Oneonta Jane West, Trident Technical College Michelle White, Terra Community College Bonnie-Lou Wicklund, Mount Wachusett
Yan Zheng-Araujo, Springfield
Community Technical College
Cathleen Zucco-Teveloff, Rider
University
Mark A Zuiker, Minnesota State
University,
Trang 19Acknowledgments for the Global Edition
Contributors
Vikas Arora, StatisticianKiran Paul, Statistician
Reviewers
Santhosh Kumar, Christ University
Kiran Paul, Statistician
Chirag Trivedi, R J Tibrewal Commerce College
Trang 20My Stat Lab® Online Course for Essential Statistics: Exploring
the World through Data, Second Edition, Gould/Ryan/Wong
(access code required)
MyStatLab is available to accompany Pearson’s market-leading text offerings To give students a consistent tone, voice, and teaching method, each text’s flavor and approach is tightly integrated throughout the accompanying MyStatLab course, mak- ing learning the material as seamless as possible.
Technology Tutorial Videos and Study Cards
Technology Tutorials provide brief
video walkthroughs and step instructional study cards on common statistical procedures for Minitab®, Excel®, and the TI-83/84 graphing calculator
step-by-Data Cycle of Everyday
Things Videos
Data Cycle of Everyday Things Videos
demonstrate for students that data can
be, and is, a part of everyday life! Through
a series of fun and engaging episodes, students learn to collect, analyze, and apply data to answer any range of real-
world statistical questions
Chapter Review Videos
Chapter Review Videos walk students through
solving a selection of the more complex problems posed in each chapter, provid-ing a review of the chapter’s key concepts and methods and offering students support where they most need it
Resources for Success
Trang 21Instructor’s Solutions Manual
Instructor’s Solutions Manual contains worked-out
solutions to all the text exercises
TestGen
TestGen ® (www.pearsoned.com/testgen)
Updated to more closely mirror the 2nd Edition book,
TestGen enables instructors to build, edit, print, and
administer tests using a computerized bank of questions developed to cover all the objectives of the text
Online Test Bank
Online Test Bank (download only) includes tests for
each chapter with questions aimed to reinforce the text’s learning objectives
PowerPoint® Lecture Slides
PowerPoint Lecture Slides, aligned with the text,
provide an overview of each chapter, stressing important definitions and offering additional examples Multiple-choice questions are included for class assessment
Student Resources
Additional resources for student success via MyStatLab
Study Cards for Statistics Software
Study Cards for Statistics Software This series
of study cards, available for Excel, Minitab, JMP®, SPSS®, R, StatCrunch, and the TI-84 graphing calculators, provides students with easy, step-by-step guides to the most common statistics software
PowerPoint Slides
PowerPoint Slides provide an overview of each
chapter, stressing important definitions and offering additional examples These slides are an excellent resource for both traditional and online students
Chapter Review Videos
Chapter Review Videos walk students through
solving a selection of the more complex problems posed in each chapter, thereby offering a review of the chapter’s key concepts and providing students support where they most need it
The Data Cycle of Everyday Things Videos
The Data Cycle of Everyday Things Videos
demonstrate for students that data collection and data analysis can be applied to answer questions about everyday life Through a series of fun and engaging episodes, students learn to collect, analyze, and apply data to answer any range of real-world statistical questions
Resources for Success
Trang 22cats’ birth weights, 314
elephants’ birth weights, 314
eye color and sex, 266, 268
finger length, 530
gestation periods for animals, 92
heights and armspans, 88, 209, 210–211
sleep time of animals, 153
temperature and frog’s jumping
performance, 534 tree heights, 220
twins, 314
white blood cells, 312
women’s heights, 310–311, 312–314
BUSINESS AND ECONOMICS
benefits of having rich people, 266
film budgets and grosses, 220
Foreign Direct Investment, 368
Navy commissary prices, 488
Occupy Wall Street, 533
oil leaders, 531
pay rate in different currencies, 146
price change in wheat, 93–94
prices at Target and Whole Foods, 219–220
retail car sales, 95 salaries/wages, 204, 208, 209, 215, 220 shrinking middle class, 80
soda production, 172–173 stressed moms, 368–369 tax regime, 423 textbook prices, 91, 486, 487 turkey costs, 215
used car values, 192–193 wealth distribution in United States, 94 women CEOs, 416
CRIME AND CORRECTIONS
“boot camp” and prevention, 413, 414 burglary, 148
car thefts, 266 counseling and criminology, 419 death row, 154
DWI convictions, 315 homicide clearance, 315 jury duty, 264
magistrate’s court judges, 259 parental training and criminal behavior of children, 537
Perry Preschool attendance and arrests, 537
Scared Straight prevention program, 56 SIDS or murder?, 229–230
stolen bicycles, 296–297, 314 therapy and criminology, 419 violent crime, 147, 158
EDUCATION
ACT scores, 311 alumni donations, 536 Audio-visual aids and grades, 54
BA percentage, 153 bar exam pass rates, 66, 72, 153, 221 BAs and median income, 204 changing multiple-choice answers, 261 cheating, 414
choosing science for higher studies, 422 college admission rates, 314–315, 481 college dropout rate, 414
college enrollment, 451 college graduation, 315, 481 college professors’ salaries, 204 confidence in public schools, 367 course enrollment rates, 54 debt after graduation, 481 education and marital status, 238, 239–240, 241, 245, 511–512 education and widows, 249 employment after law school, 98 exam scores, 121–122, 154, 157, 217,
222, 223, 478, 490 favorite subject, 52 gender and education, 532 gender gap in universities, 212–213 gender of teachers, 366
GPAs, 203, 205, 482, 483, 484 grades, 261
height and test scores, 219 high school graduation rates, 369, 370–371, 373–374
hours of study, 156, 221 law school tuition, 92–93 literacy rate, 265, 421 marriage and college degree, 261 math scores, 109–110
multiple-choice tests, 93, 100, 262, 414, 423
number of years of formal education, 89 Oregon bar exam, 366, 373
parental education, 530 parental educational level, 145, 216–217 percentage of students married or parents, 268
preschool attendance, 369, 370–371, 373–374
preschool attendance and high school graduation rates, 532–533 proportion of seniors in student population, 365
pursuing economics, 367 random answering, 414–415 random assignment of professors, 259 salary and education, 208, 220 SAT scores, 91, 148, 184–185, 265, 310,
312, 313, 316–317, 317–318 school drop-out rates, 536 shoe sizes, 99, 203, 218–219 student heights, 480 student-to-teacher ratio, 61–62, 84 teacher satisfaction, 251
teachers’ pay and costs of education, 215 true/false tests, 422, 423
tuition and fees, 88, 205, 444, 455, 470–471
value of college education, 260–261, 262 working and student grades, 215–216
EMPLOYMENT
age discrimination, 415 career goals, 372 cleanliness drive, 367 commuting times, 88 corporate organization and gender, 264 day at spa, 52
eating out and jobs, 91, 100, 146, 156
21
Trang 23work from home, 421
working and student grades, 215–216
ENTERTAINMENT
animated movies, 148–149
film budgets and grosses, 220
hours of television viewing, 97, 490
condo rental prices, 88
land value prediction, 203
real estate prices, 146, 150, 157, 208, 218
fat in sliced turkey and ham compared, 126
fresh juice vs bottled juice, 413
fruit juice, 364–365 grocery prices, 490, 491 hungry monkeys, 512–513 mercury in freshwater fish, 366, 417 number of alcoholic drinks per week,
147, 149 organic food, 487 pizza size, 448 popcorn, 501–502, 525 protein intake, 312 size of ice cream cones, 273, 306 soda consumption, 155
soda production, 172–173, 267 soft drink serving size, 482 sugar-free diet and arthritis, 418 sugar in fruits, 93
vegan diets, 366 weight of carrots, 481 weight of colas, 489 weight of hamburgers, 93, 482 weight of ice cream cones, 489 weight of oranges, 481 weight of tomatoes, 483
GAMES
blackjack tips, 222 brain games, 44–45, 521–522 color of cubes, 268
dealing cards, 247–248 drawing cards, 260 flipping coins, 250–251, 260, 263, 264,
309, 314, 315 gambling, 264 lotto, 413 rolling dice, 242, 261, 414 running speed, 98 spinning coins, 390–391, 397, 417, 418 strike rate of batsman, 148
throwing dice, 235–236, 253–254, 263,
264, 265, 276–277, 309
GENERAL INTEREST
accuracy of shooting, 364 ages of students, 261 anniversaries and days of the week, 260, 265
book widths, 182–183 Cambridge nobel laureates, 95 children’s ages, 146
dogs vs cats, 339, 343–344
eating and gender, 144 four and two wheelers, 268 gender and toys, 529 guitar chords, 483
hand folding, 262, 269 home and car ownership, 261 joint bank account, 314 Morse code, 346–347, 353, 364, 417 musuem visit, 535
number of pairs of shoes owned, 88, 150–151, 488
offices with pantry, 95–96 offices with reception areas, 144 pets, 92
pocket money, 146 printing times, 88 rating hotels, 208 risky activities, 107–108, 140–141 seesaw height, 208, 211
skyscrapers, 136–137, 153 sleeping in, 460–461 tossing thumbtacks, 309, 261 weight of trash, 207, 216
HEALTH
age and sleep, 204, 219 age and weight, 211, 219 antibiotic or placebo, 530 antibiotics, 369–370, 413, 414 antiretrovirals to prevent HIV, 533 arthritis, 418
Atkins diet, 484 autism and MMR vaccine, 537 bariatric surgery for diabetes, 533 birth lengths, 146, 148
birth weights, 146, 148, 479 blood pressure, 46, 214 blood sugar, 536 blood thinners, 57 body mass index, 87, 483 body temperatures, 483, 492 breast cancer, 55
calcium, 473–474 calcium and death rate, 535–536 caloric restriction, 512–513 cancer survival, 413 causes of death, 80–81 cell phones and cancer, 27–28, 47–48 coffee and stroke, 535
college athletes’ weights, 486 colored vegetables and stroke, 536–537 copper bracelets, 55
Crohn’s disease, 47, 359–360 dancers’ heights, 484 death row and head trauma, 58 deep vein thrombosis, 422 depression treatment, 54 diarrhea vaccine in Africa, 419 diet drug, 370
dieting, 468–469 drug for asthma, 536, 539 drug for platelets, 535 drug for rheumatoid arthritis, 536 early tonsillectomy for children, 54
Trang 24heights and ages for children, 222
heights and weights, 176–177, 205,
217–218 heights of bedridden patients, 187–189
heights of children, 128–130
heights of fathers and sons, 154, 217
heights of females, 148, 157
heights of males, 157, 483, 488
heights of students and parents, 491
HIV-1 and HIV-2, 57
hormone replacement therapy, 97
hospital rooms, 534–535
human cloning, 368
hypothermia for babies, 419
ideal weight, 99
iron and death rate, 535
jet lag drug, 530–531
life expectancy, 212, 215
light exposure in mice, 55–56
low birth weights, 316
number of AIDS cases, 53, 58–59
obesity and marital status, 531
pregnancy, 148, 429–430, 475
prostate cancer, 56, 538
protein intake, 312
pulse rates, 437, 484, 485, 487, 493
quantity of water drunk, 100
removal of healthy appendixes, 534
scorpion antivenom, 57
SIDS, 229–230
sleep, 87, 88, 96, 204, 488, 490
sleep deprivation, 55 sleep medicine for shift workers, 422 steroids and height, 535
strength training, 53 stroke, 57
stroke survival rate, 413 sugar-free diet and arthritis, 418 systolic blood pressures, 485–486 tight control of blood sugar, 536 transfusions for bleeding in the stomach, 370
treatment for CLL, 419, 424 triglycerides, 485, 486 vaccinations for diarrhea, 538 vegan diets, 366
video games and body mass index, 208 vitamin C and allergies, 55
vitamin D and osteoporosis, 57 weight of employees, 478 weight loss, 55, 145, 204, 421, 531 weights of soccer players and academic decathlon team members compared, 146
weights of vegetarians, 483
LAW
ages of prime ministers, 145 gun control, 417, 421, 423 Oregon bar exam, 366, 373 magistrate’s court judges, 259 three-strikes law, 421–422
POLITICS
dodging the question, 379–380, 409–410 European Union membership, 369 favorable neighboring country, 371 military coups, 145
party and right direction, 530 political party, 267–268 primary elections of 2012, 372 socialism, 265
PSYCHOLOGY
boys’ weight perception, 490–491 brain games, 44–45
complexion, 96 confederates and compliance, 55, 533–534 depression treatment, 54
dreaming, 372, 415 extrasensory perception, 293, 298–300, 364–365, 414, 420, 422–423 financial incentive effectiveness, 418 happiness and traditional views, 263 happiness and wealth, 266
IQs, 148 obesity and marital status, 56 opinion about music, 269 poverty and IQ, 41–42 sleep walking, 416 smiling, 419
smiling and age, 532
TV violence, 506–507, 530, 531–532
SOCIAL ISSUES
adoptions, 95 age by year, 95 ages of brides and grooms, 488 belief in UFOs, 269
body piercings, 74–75 cell phone calls, 479 death row and head trauma, 58 drunk walking, 315
education and marital status, 238, 239–240, 241, 245, 511–512 education and widows, 249 gender and opinion on same-sex mar- riage, 504–505
gender gap in universities, 212–213 guns in homes, 422
happiness, 147, 149, 220, 489 ideal family, 155
Iraq casualties and hometown populations, 215
marital status in India, 94 life expectancy and TV, 215 marital status, 56
marriage and college degree, 261 marriage rates, 53
number of births and population, 54 number of children, 482
number of siblings, 52, 89, 204 obesity and marital status, 531, 539 Odd-Even Formula, 533
population and number of billionaires, 213
population density, 53, 151, 156 population in 2007, 53
population increase, 156 population prediction, 53 probation and gender, 56–57 proportion of people who are married, 421 school drop-out rates, 536
smiling and age, 420 spring break fever, 325–326, 360
SPORTS
annual sports, 421 athlete’s age and speed, 212 baseball players, 482 basketball free-throw shots, 267, 304 basketball team heights, 492 batting and bowling, 261 marathon size, 70–71, 134, 155 NCAA soccer players, 74 race finishing times, 189 surfing, 145–146, 156, 488 T-20 cricket match, 146 weights of backpacks, 488 weights of baseball and soccer players compared, 91
Trang 25weights of college athletes, 486
wins and strike-outs for baseball
economics in East Germany, 367
European Union membership, 369
favorable neighboring country, 371
gender and opinion on same-sex
millionaires with master’s degree, 366
most important problem, 531
musician survey, 96
news survey, 305
opinion about music, 269
opinion about nurses, 269
opinions on global warming, 98–99, 417 party and right direction, 530
political party affiliation, 94–95 presidential elections, 344–345, 368 salary deduction, 417
sexual harassment, 331–332 stem cell research, 345, 356–358, 405–406
tax benefits, 363 taxes, 417 tourists by month, 537 underwater mortgages, 352–353 use of helmets, 414
using Facebook, 367–368 value of college education, 260–261,
262, 269 wording of polls, 421
TECHNOLOGY
age and the Internet, 327 cell phone use, 96, 479 e-readers, 465–467 Internet access, 265, 315 Internet advertising, 382–383 teens and the Internet, 244 text messages, 93, 214, 216 using Facebook, 367
TRANSPORTATION
age and traffic rules, 529 air fares, 207–208
crash-test results, 31 distance and time, 207–208 driver’s exam, 262, 265, 315–316 drivers aged 84–89, 315
driving accidents, 156–157 DWI convictions, 315 gas mileage of cars, 220 gas prices, 110–111, 116–117, 125 KMPL for highway and city, 214 meter thieves, 167, 200
pedestrian fatalities, 54 plane crashes, 417 right of way, 406–408 seat belt use, 35–37, 263, 269, 414, 415–416
speed driven, 99 speeding tickets, 88, 155 stolen bicycles, 296–297, 314 stolen car rates, 38
SUVs, 414 texting while driving, 315, 422 time and distance of flights, 212, 222 traffic cameras, 99
traffic lights, 267 turn signal use, 370 use of helmets, 414 used car age and mileage, 171–172, 192–193, 480
used car values, 192–193 waiting for the bus, 278–279
Trang 27Introduction
to Data
1
Trang 28In September 2002, Dr Christopher Newman, a resident of Maryland, sued Motorola,
Verizon, and other wireless carriers, accusing them of causing a cancerous brain tumor
behind his right ear As evidence, his lawyers cited a study by Dr Lennart Hardell
Hardell had studied a large number of people with brain tumors and had found that
a greater percentage of them used cell phones than of those who did not have brain
tumors (CNN 2002; Brody 2002)
Speculation that cell phones might cause brain cancer began as early as 1993, when
(as CNN reports) the interview show Larry King Live featured a man who claimed that
his wife died because of cancer caused by her heavy cell phone use However, more
recent studies have contradicted Hardell’s results, as well as earlier reports about the
health risks of heavy cell phone use
The judge in Dr Newman’s trial was asked to determine whether Hardell’s
study was compelling enough to support allowing the trial to proceed Part of this
Deadly Cell Phones?
CASE STUDY
of the deaths She organized these data graphically, and these graphs enabled her to see a very important pattern: A large percentage of deaths were due
to contagious disease, and many deaths could be prevented by improving sanitary conditions Within six months, Nightingale had reduced the death rate by half Eventually she convinced Parliament and military authorities to completely reorganize the medical care they provided Accordingly, she is credited with inventing modern hospital management
In modern times, we have equally important questions
to answer Do cell phones cause brain tumors? Are alcoholic drinks healthful in moderation? Which diet works best for losing weight? What percentage of the public is concerned about job security? Statistics—the science (and art!) of collecting and analyzing observations
to learn about ourselves, our surroundings, and our universe—helps answer questions such as these
Data are the building blocks of statistics This chapter introduces some of the basic types of data and explains how we collect them, store them, and organize them
These ideas and skills will provide a basic foundation for your study of the rest of the text
This text will teach you to examine data to
better understand the world around you If you know how to sift data to find patterns, can communicate the results clearly, and understand whether you can generalize your results to other groups
and contexts, you will be able to make better decisions,
offer more convincing arguments, and learn things you
did not know before Data are everywhere, and making
effective use of them is such a crucial task that one
prominent economist has proclaimed statistics one of
the most important professions of the decade (McKinsley
Quarterly 2009).
The use of statistics to make decisions and convince
others to take action is not new Some statisticians
date the current practice of statistics back to the
mid-nineteenth century One famous example occurred in
1854, when the British were fighting the Russians in the
brutal Crimean War A British newspaper had criticized
the military medical facilities, and a young but
well-connected nurse, Florence Nightingale, was appointed to
study the situation and, if possible to improve it
Nightingale carefully recorded the numbers of deaths,
the causes of the deaths, and the times and dates
THEME
Statistics is the science of data, so we must learn the types of data we will
encounter and the methods for collecting data The method used to collect data is very important because it determines what types of conclusions we can reach and, as you’ll learn in later chapters, what types of analyses we
can do By organizing the data we’ve collected, we can often spot patterns that are not otherwise obvious.
Trang 29determination involved evaluating the method that Hardell used to collect data If you were the judge, how would you rule? You will learn the judge’s ruling at the end of the chapter You will also see how the methods used to collect data about important
cause-and-effect relationships—such as that which Dr Newman alleged to exist between cell phone use and brain cancer—can affect the conclusions we can draw
The study of statistics rests on two major concepts: variation and data Variation is the
more fundamental of these concepts To illustrate this idea, draw a circle on a piece of paper Now draw another one, and try to make it look just the same Now another Are all three exactly the same? We bet they’re not They might be slightly different sizes, for instance, or slightly different versions of round This is an example of variation
How can you reduce this variation? Maybe you can get a penny and outline the penny
Try this three times Does variation still appear? Probably it does, even if you need a magnifying glass to see, say, slight variations in the thickness of the penciled line
Data are observations that you or someone else records The drawings in
Figure 1.1 are data that record our attempts to draw three circles that look the same
Analyzing pictorial data such as these is not easy, so we often try to quantify such observations—that is, to turn them into numbers How would you measure whether these three circles are the same? Perhaps you would compare diameters or circumfer-ences, or somehow try to measure how and where these circles depart from being perfect circles Whatever technique you chose, these measurements could also be con-sidered data
Data are more than just numbers, though David Moore, a well-known statistician, defined data as “numbers in context.” By this he meant that data consist not only of the numbers we record, but also of the story behind the numbers For example,
10.00, 9.88, 9.81, 9.81, 9.75, 9.69, 9.5, 9.44, 9.31are just numbers But in fact these numbers represent “Weight in pounds of the ten heaviest babies in a sample of babies born in North Carolina in 2004.” Now these numbers have a context and have been elevated into data See how much more interest-ing data are than numbers?
SECTION 1.1
What Are Data?
FIGURE 1.1 (a) Three circles
drawn by hand (b) Three circles
drawn using a coin It is clear that
the circles drawn by hand show
more variability than the circles
drawn with the aid of a coin.
(a)
(b)
Details
Data Are What Data Is
If you want to be “old school”
grammatically correct, then
the word data is plural So
we say “data are” and not
“data is.” The singular form is
datum However, this usage is
changing over time, and some
dictionaries now say that
data can be used as both a
singular and a plural noun.
Trang 301.1 WHAT ARE DATA? CHAPTER 1 29
These data were collected by the state of North Carolina in part to help researchers
understand the factors that contribute to low-weight and premature births If doctors
understand the causes of premature birth, they can work to prevent it—perhaps by
helping expectant mothers change their behavior, perhaps by medical intervention, and
perhaps by a combination of both
KEY
POINT Data are “numbers in context.”
In the last few years, our culture and economy have been inundated with data
The magazine The Economist has called this surge of data the “data deluge.” One
reason for the rising tide of data is the application of automated data collection
devices These range from automatic sensors that simply record everything they see
and store the data on a computer, to websites and smart phone apps that record every
transaction their users make Google, for example, saves every search you make
and combines this with data on which links you click in order to improve the way
it presents information (and also, of course, to determine which advertisements will
appear on your search results)
Thanks to small, portable sensors, you can now join the “Personal Data Movement.”
Members of this movement record data about their daily lives and analyze it in order to
improve their health, to run faster, or just to make keepsakes—a modern-day scrapbook
Maybe you or a friend uses a Nike Fuel Band to keep track of regular runs One of
the authors of this text carries a FitBit in his pocket to record his daily activity From
this he learned that on days he lectures, he typically takes 7600 steps, and on days
that he does not lecture, he typically only takes 4900 steps Some websites, such as
your.flowingdata.com, make use of Twitter to help users collect, organize, and
understand whatever personal data they choose to record
Of course, it is not only machines that collect data Humans still actively collect
data with the intent of better understanding some phenomenon or making a discovery
Marketers prepare focus groups and surveys to describe the market for a new product
Sports analysts collect data to help their teams’ coaches win games, or to help fantasy
football league players Scientists perform experiments to test theories and to measure
changes in the economy or the climate In this text you’ll learn about the many ways in
which data are used
The point is that we have reached a historical moment where almost everything
can be thought of as data And once you find a way of capturing data about something
in your world, you can organize, sort, visualize, and analyze those data to gain deeper
understanding about the world around you
Data analysis involves creating summaries of data and explaining what these summaries tell us about the real world.
KEY
POINT
What Is Data Analysis?
In this text you will study the science of data Most important, you will learn to
ana-lyze data What does this mean? You are analyzing data when you examine data of
some sort and explain what they tell us about the real world In order to do this, you
must first learn about the different types of data, how data are stored and structured,
and how they are summarized The process of summarizing data takes up a big part
of this text; indeed, we could argue that the entire text is about summarizing data,
either through creating a visualization of the data or distilling them down to a few
numbers that we hope capture their essence
Trang 31When we work with data, they are grouped into a collection, which we call
either a data set or a sample The word sample is important, because it implies that
the data we see are just one part of a bigger picture This “bigger picture” is called a
population Think of a population as the Data Set of Everything—it is the data set that
contains all of the information about everyone or everything with respect to whatever variable we are studying Quite often, the population is really what we want to learn about, and we learn about it by studying the data in our sample However, many times
it is enough just to understand and describe the sample For example, you might collect data from students in your class simply because you want to know about the students
in your class, and not because you wish to use this information to learn about all
stu-dents at your school Sometimes, data sets are so large that they effectively are the
population, as you’ll soon see in the data reflecting births in North Carolina
Two Types of Variables
The variables you’ll find in your data set come in two basic types, which can selves be broken into smaller divisions, as we’ll discuss later
them-Numerical variables describe quantities of the objects of interest The values will
be numbers The weight of an infant is an example of a numerical variable
Categorical variables describe qualities of the objects of interest These
val-ues will be categories The sex of an infant is an example of a categorical variable
The possible values are the categories “male” and “female.” Eye color of an infant is another example; the categories might be brown, blue, black, and so on You can often
identify categorical variables because their values are usually words, phrases, or letters
(We say “usually” because we sometimes use numbers to represent a word or phrase
Stay tuned.)
The first step in understanding data is to understand the different types of data you will encounter As you’ve seen, data are numbers in context But that’s only part
of the story; data are also recorded observations Your photo from your vacation
to Carhenge in Nebraska is data (Figure 1.2) The ultraviolet images streaming from the Earth Observer Satellite system are data (Figure 1.3) These are just two examples of data that are not numbers Statisticians work hard to help us analyze complex data, such as images and sound files, just as easily as we study numbers
Most of the methods involve recoding the data into numbers For example, your photos can be digitized in a scanner, converted into a very large set of numbers, and then analyzed You might have a digital camera that gives you feedback about the quality of a photo you’ve taken If so, your camera is not only collecting data but also analyzing it!
Almost always, our data sets will consist of characteristics of people or things
(such as gender and weight) These characteristics are called variables Variables are
not “unknowns” like those you studied in algebra We call these characteristics ables because they have variability: The values of the variable can be different from person to person
vari-SECTION 1.2
Classifying and Storing Data
m FIGURE 1.2 A photo of
Carhenge, Nebraska.
m FIGURE 1.3 Satellites in NASA’s
Earth Observing Mission record
ultraviolet reflections and transmit
these data back to Earth Such
data are used to construct images
of our planet Earth Observer
(http://eos.gsfc.nasa.gov/).
KEY POINT Variables in statistics are different from variables in algebra In statistics, variables
record characteristics of people or things.
Details
More Grammar
We’re using the word sample
as a noun—it is an object,
a collection of data that we
study Later we’ll also use the
word sample as a verb—that
is, to describe an action For
example, we’ll sample ice cream
cones to measure their weight.
Details
Quantitative and Qualitative
Data
Some statisticians use the
word quantitative to refer to
numerical variables (think
“quantity”) and qualitative to
refer to categorical variables
(think “quality”) We prefer
numerical and categorical Both
sets of terms are commonly
used, and you should be
prepared to hear and see both.
Trang 321.2 CLASSIFyINg AND STORINg DATA CHAPTER 1 31
EXAMPLE 1 Crash-Test Results
The data in Table 1.1 are an excerpt from crash-test dummy studies in which cars
are crashed into a wall at 35 miles per hour Each row of the data set represents the
observed characteristics of a single car This is a small sample of the database, which
is available from the National Transportation Safety Administration The head injury
variable reflects the risk to the passengers’ heads The higher the number, the greater
doors a categorical variable,
because nearly all cars have either 2 doors or 4 doors, and for many people, the number
of doors designates a certain type of car (small or larger)
There’s nothing wrong with that.
Coding Categorical Data with Numbers
Sometimes categorical variables are “disguised” as numerical The smoke variable in
the North Carolina data set (Table 1.2) has numbers for its values (0 and 1), but in fact
those numbers simply indicate whether or not the mother smoked Mothers were asked,
“Did you smoke?” and if they answered “Yes,” the researchers coded this categorical
response with a 1 If they answered “No,” the response was coded with a 0 These
par-ticular numbers represent categories, not quantities Smoke is a categorical variable.
Coding is used to help both humans and computers understand what the values of
a variable represent For example, a human would understand that a “yes” under the
“Smoke” column would mean that the person was a smoker, but to the computer, “yes”
is just a string of symbols If instead we follow a convention where a 1 means “yes”
and a 0 means “no,” then a human understands that the 1’s represent smokers, and
a computer can easily add the values together to determine, for example, how many
smokers are in the sample
Sometimes, researchers code categorical variables with numerical values.
For each variable, state whether it is numerical or categorical
QUESTION
Their values are descriptive names The units of doors are,
quite simply, the number of doors The units of weight are
pounds The variables doors and weight are numerical because
their values are measured quantities The units for head injury
are unclear; head injury is measured using some scale that the
researchers developed
Trang 33This approach for coding categorical variables is quite common and useful If a
categorical variable has only two categories, as do gender and smoke, then it is almost
always helpful to code the values with 0 and 1 To help readers know what a “1”
means, rename the variable with either one of its category names A “1” then means the person belongs to that category, and a 0 means the person belongs to the other cate-
gory For example, instead of calling a variable gender, we rename it female And then
if the baby is a boy we enter the code 0, and if it’s a girl we enter the code 1
Sometimes your computer does the coding for you without your needing to know
anything about it So even if you see the words female and male on your computer, the
computer has probably coded these with values of 0 and 1 (or vice versa)
Storing Your Data
The format in which you record and store your data is very important Computer grams will require particular formats, and by following a consistent convention, you can be confident that you’ll better remember the qualities of your own data set if you need to revisit it months or even years later Data are often stored in a spreadsheet-like format in which each row represents the object (or person) of interest Each column represents a variable In Table 1.2, each row represents a baby The column heads are
pro-variables: Weight, Female, and Smoke This format is sometimes referred to as the
stacked data format.
When you collect your own data, the stacked format is almost always the best way
to record and store your data One reason is that it allows you to easily record several different variables for each subject Another reason is that it is the format that most software packages will assume you are using for most analyses (The exceptions are TI-84 and Excel.)
Some technologies, such as the TI calculators, require, or at least accommodate,
data stored in a different format, called unstacked data Unstacked data tables are also
common in some books and media publications In this format, each column represents
a variable from a different group For example, one column could represent men’s heights, and another column could represent women’s heights The data set, then, is a
single variable (height) broken into two groups The groups are determined by a
cat-egorical variable Table 1.3 shows an example of unstacked data, and Figure 1.4 shows the same data in TI-84 input format
By way of contrast, Table 1.4 shows the same data in stacked format
The great disadvantage of the unstacked format is that it can store only two ables at a time: the variable of interest (for example, height), and a categorical variable that tells us which group the observation belongs in (for example, gender) However, most of the time, we record many variables for each observation For example, we record a baby’s weight, gender, and whether or not the mother smoked The stacked format enables us to display as many variables as we wish
vari-EXAMPLE 2 Personal Data Collection
Using a sensor worn around her wrist, Safaa recorded the amount of sleep she got on several nights She also recorded whether it was a weekend or a weeknight For the weekends, she recorded (in hours): 8.1, 8.3 For the weeknights she recorded 7.9, 6.5, 8.2, 7.0, 7.3
Details
Numerical Categories
Categories might be numbers
Sometimes, numerical
variables are coded as
categories, even though we
wish to use them as numbers
For example, number of
siblings might be coded as
“none,” “one,” “two,” “three,”
etc Although words are
used, this is really a numerical
variable since it is counting
something.
m TABLE 1.2 Data for newborns
with coded categorical variables.
Weight Female Smoke
m FIGURE 1.4 TI-84 data input
screen (unstacked data).
each column measures a characteristic of that observation For Safaa, the unit of
Trang 341.2 CLASSIFyINg AND STORINg DATA CHAPTER 1 33
observation was a night of sleep, and she measured two characteristics: time and
whether or not it was a weekend In stacked format, her data would look like this:
(Note that you might have coded the “Weekend” variable differently For example,
instead of entering “Yes” or “No,” you might have written either “Weekend” or
“Weeknight” in each row.)
In the unstacked format, the numerical observations appear in separate columns,
depending on the value of the categorical variable:
m TABLE 1.4 The same data as
in Table 1.3, shown here in stacked format.
Look at the Data Set!
The fact that different people use different formats to store data means that the first step
in any data investigation is to look at the data set In most real-life situations, stacked data are the more useful format, because this format permits you to work with several variables at the same time.
The context is the most important aspect of data, although it is frequently overlooked
Table 1.5 shows a few lines from the data set of births in 2004 in North Carolina
(Holcomb 2006)
To understand these data, we need to ask and try to answer some questions in
order to better understand the context: Who, or what, was observed? What variables
were measured? How were they measured? What are the units of measurement? Who
collected the data? How did they collect the data? Where were the data collected? Why
were the data collected? When were the data collected?
Many, but not all, of these questions can be answered for these data by reading
the information provided on the website that hosts the data Other times we are not so
lucky and must rely on very flimsy supporting documentation If you collect the data
yourself, you should be careful to record this extra supporting information Or, if you
get a chance to talk with the people who collected the data, then you should ask them
these questions
d Who, or what, was observed? In this data set, we observed babies Each line in the
table represents a newborn baby born in North Carolina in 2004 If we were to see the
whole table, we would see a record of every baby born in 2004 in North Carolina
Trang 35d What variables were measured? For each baby, the state records the weight, the
gender, and whether the mother smoked
d How were the variables measured? Unknown Presumably, most measurements on
the baby were taken from a medical caregiver at the time of the birth, but we don’t know how or when information about the mother was collected
d What are the units of measurement? Units of measurement are important The
same variable can have different units of measurement For example, weight could
be measured in pounds, in ounces, or in kilograms For Table 1.5,Weight: reported in pounds
Gender: reported as M for boys and F for girls
Smoke: reported as a 1 if the mother smoked during the pregnancy, as a 0 if she did not
d Who collected the data? The government of the state of North Carolina.
d How did they collect the data? Data were recorded for all births that occurred in
hospitals in North Carolina Later in the chapter you’ll see that data can be lected by drawing a random sample of subjects, or by assigning subjects to receive different treatments, as well as through other methods The exact method used for Table 1.5 is not clear, but the data were probably compiled from publicly available medical records and from reports by the physicians and caregivers
col-d Where were the data collected? The location where the data were collected often
gives us information about who (or what) the study is about These data were lected in North Carolina and consist of babies born in that state We should there-fore be very wary about generalizing our findings to other states or other countries
col-d Why were the data collected? Sometimes, data are collected to learn about a larger
population At other times, the goals are limited to learning more about the sample itself In this case the data consist of all births in North Carolina, and it is most likely that researchers wanted to learn how the health of infants was related to the smoking habits of mothers within this sample
d When were the data collected? The world is always changing, and so conclusions
based on a data set from 1980 might be different from conclusions based on data collected for a similar study in 2015 These data were collected in 2004
KEY POINT The first time you see a data set, ask yourself these questions:
d Who, or what, was observed?
d What variables were measured?
d How were the variables measured?
d What are the units of measurement?
d Who collected the data?
d How did they collect the data?
d Where were the data collected?
d Why did they collect the data?
d When were the data collected?
Once we have a data set, we next need to organize and display the data in a way that helps us see patterns This task of organization and display is not easy, and we discuss
it throughout the entire text In this section we introduce the topic for the first time, in the context of categorical variables
SECTION 1.3
Organizing Categorical Data
Trang 361.3 ORgANIzINg CATEgORICAL DATA CHAPTER 1 35
With categorical variables, we are usually concerned with knowing how often a
par-ticular category occurs in our sample We then (usually) want to compare how often a
category occurs for one group with how often it occurs for another (liberal/conservative,
man/woman) To do these comparisons, you need to understand how to calculate
percentages and other rates
A common method for summarizing two potentially related categorical variables
is to use a two-way table Two-way tables show how many times each combination of
categories occurs For example, Table 1.6 is a two-way table from the Youth Behavior
Risk Survey that shows gender and whether or not the respondent always (or almost
always) wears a seat belt when riding in or driving a car The actual Youth Behavior
Risk Survey has over 10,000 respondents, but we are practicing on a small sample
from this much larger data set
The table tells us that 2 people were male and did not always wear a seat belt
Three people were female and did not always wear a seat belt These counts are also
called frequencies A frequency is simply the number of times a value is observed in a
data set
Some books and publications discuss two-way tables as if they displayed the
original data collected by the investigators However, two-way tables do not consist of
“raw” data but, rather, are summaries of data sets For example, the data set that
pro-duced Table 1.6 is shown in Table 1.7
To summarize this table, we simply count how many of the males (a 1 in the Male
column) also do not always wear seat belts (a 1 in the Not Always column) We then
count how many both are male and always wear seat belts (a 1 in the Male column, a
0 in the Not Always column); how many both are female and don’t always wear seat
belts (a 0 in the Male column, a 1 in the Not Always column); and finally, how many
both are female and always wear a seat belt (a 0 in the Male column, a 0 in the Not
Always column)
Example 3 illustrates that summarizing the data in a two-way table can make it
easy to compare groups
EXAMPLE 3 Percentages of Seat Belt Wearers
The 2011 Youth Behavior Risk Survey is a national study that asks American youths
about potentially risky behaviors We show the two-way summary again All of the
people in the table were between 14 and 17 years old The participants were asked
whether they wear a seat belt while driving or riding in a car The people who said
always or almost always were put in the Always group The people who said
some-times or rarely were put in the Not Always group
two-m TABLE 1.6 This two-way table shows counts for 15 youths who responded to a survey about wear- ing seat belts.
in red those who did not always wear a seat belt (the risk takers).
Male Not Always
a How many men are in this sample? How many women? How many people do not
always wear seat belts? How many always wear seat belts?
b What percent of the sample are men? What percent are women? What percent don’t
always wear seat belts? What percent always wear seat belts?
c Are the men in the sample more likely than the women in the sample to take the
risk of not wearing a seat belt?
Trang 37a We can count the men by adding the first column: 2 + 3 = 5 men Adding the second column gives us the number of women: 3 + 7 = 10.
We get the number who do not always wear seat belts by adding the first row:
2 + 3 = 5 people don’t always wear seat belts Adding the second row gives us the number who always wear seat belts: 3 + 7 = 10
b This question asks us to convert the numbers we found in part (a) to percentages To
do this, we divide the numbers by 15, because there were 15 people in the sample
To convert to percentages, we multiply this proportion by 100%
The proportion of men is 5>15 = 0.333 The percentage is 0.333 * 100% = 33.3%
The proportion of women must be 100% - 33.3% = 66.7% (10>15 * 100% =66.7%)
The proportion who do not always wear seat belts is 5>15 = 0.333, or 33.3%
The proportion who always wear seat belts is 100% - 33.3% = 66.7%
c You might be tempted to answer this question by counting the number of males who don’t always wear seat belts (2 people) and comparing that to the number of females who don’t always wear seat belts (3 people) However, this is not a fair comparison because there are more females than males in the sample Instead, we should look at the percentage of those who don’t always wear seat belts in each group This ques-tion should be reworded as follows:
Is the percentage of males who don’t always wear seat belts greater than the centage of females who don’t always wear seat belts?
Because 2 out of 5 males don’t always wear seat belts, the percent of males who don’t always wear seat belts is (2>5) * 100% = 40%
Because 3 out of 10 females don’t always wear seat belts, the percent of females who don’t always wear seat belts is (3>10) * 100% = 30%
In fact, females in this sample engage in this risky behavior less often than males Among all U.S youth,
it is estimated that about 28% of males do not always wear their seat belt, compared to 23% of females
SOLUTIONS
The calculations in Example 3 took us from frequencies to percentages
Sometimes, we want to go in the other direction If you know the total number of people in a group, and are given the percentage that meets some qualification, you can
figure out how many people in the group meet that qualification.
EXAMPLE 4 Numbers of Seat Belt Wearers
A statistics class has 300 students, and they are asked whether they always ride or drive with a seat belt
Trang 381.3 ORgANIzINg CATEgORICAL DATA CHAPTER 1 37
a We need to find 30% of 300 When working with percentages, first convert the
per-centage to its decimal equivalent:
30% of 300 = 0.30 * 300 = 90Therefore, 90 students don’t always wear seat belts
b The question tells us that 20% of some unknown larger number (call it y) must be
equal to 43
0.20y = 43
Divide both sides by 0.20 and you get
y = 215There are 215 total students in the class, and 43 of them don’t always wear seat
belts
SOLUTIONS
Sometimes, you may come across data summaries that are missing crucial
infor-mation Suppose we wanted to know which team sports are the most dangerous to play
Table 1.8 shows the number of sports-related injuries that were treated in U.S
emer-gency rooms in 2009 (National Safety Council 2011) (Note that this table is not the
table of original data but is, instead, a summary of the original data.)
Wow! It’s a dangerous world out there Which would you conclude is the most
dangerous sport? Which is the least dangerous?
Did you answer that basketball was the most dangerous sport? It did have the most
injuries (501,251)—in fact, 50,000 more injuries than in football (451,961) Ice hockey
is known for its violence (you’ve heard the old joke, “I went to a fight and suddenly a
hockey match broke out”), but here, it seems to have caused relatively few injuries and
looks safe
The problem with comparing the numbers of injuries for these sports is that the
sports have different numbers of participants Injuries might be more common in
bas-ketball simply because more people play basbas-ketball Also, there might be relatively few
injuries in ice hockey merely because fewer people play One important component is
missing in Table 1.8, and the lack of this component makes our analysis impossible
Table 1.9 includes the component missing from Table 1.8: the number of
partici-pants in each sport We can’t directly compare the number of injuries from sport to
sport, because the numbers of members of the various groups are not the same This
improved table shows us the total membership of each group
m TABLE 1.8 Summary of counts
of sports injuries.
Sport Injuries
Baseball 165,842 Basketball 501,251 Bowling 20,878 Football 451,961 Ice hockey 19,035
Softball 121,175
Volleyball 60,159
b TABLE 1.9 Summary of counts
of sports injuries and numbers of participants.
Sport Participants Injuries
Trang 39Which sport is the most dangerous? We now have the information we need to answer this question Specifically, we can find the percentage of participants injured
in each sport For example, what percent of basketball players were injured? There were 24,400,000 participants and 501,251 were injured, so the percent injured is (501,251>24,400,000) * 100% = 2.05%
Sometimes, with percentages as small as this, we understand the numbers more easily if we report not a percentage, but “number of events per 1000 objects” or maybe
even “per 10,000 objects.” We call such numbers rates To get the injury rate per 1000
people, instead of multiplying (501,251>24,400,000) by 100 we multiply by 1000:
(501,251>24,400,000) * 1000 = 20.54 injuries per 1000 people
These results are shown in Table 1.10
c TABLE 1.10 Summary of rates
of sports injuries. Sport Participants Injuries per Participant Rate of Injury Thousand Participants Rate of Injury per
We see now that football is the most dangerous sport: 50.78 players are injured out
of every 1000 players Basketball is less risky, with 20.54 injuries per 1000 players
EXAMPLE 5 Comparing Rates of Stolen Cars
Which model of car has the greatest risk of being stolen? The Highway Loss Data Institute reports that the Ford F-250 pickup truck is the most stolen car; 7 F-250’s are reported stolen out of every 1000 that are insured By way of contrast, the Jeep Compass is the least stolen; only 0.5 Jeep Compass is reported stolen for every 1000 insured (Insurance Institute for Highway Safety 2013)
the number of each type of car stolen?
than others Suppose there were many more Jeep Compasses than Ford F-250’s In that case, we might see a greater number of stolen
Jeeps, simply because there are more of them to steal By
looking at the theft rate, we adjust for the total number
of cars of that particular kind on the road
KEY POINT In order for us to compare groups, the groups need to be similar When the data
consist of counts, then percentages or rates are often better for comparisons because they take into account possible differences among the sizes of the groups.
Trang 401.4 COLLECTINg DATA TO UNDERSTAND CAUSALITy CHAPTER 1 39
Often, the most important questions in science, business, and everyday life are
ques-tions about causality These are usually phrased in the form of “what if” quesques-tions
What if I take this medicine; will I get better? What if I change my Facebook profile;
will my profile get more hits?
Questions about causality are often in the news The Los Angeles Times reported
that many people believe a drink called peanut milk can cure gum disease and slow the
onslaught of baldness The BBC News (2010) reported that “Happiness wards off heart
disease.” Statements such as these are everywhere we turn these days How do we
know whether to believe these claims?
The methods we use to collect data determine what types of conclusions we can
make Only one method of data collection is suitable for making conclusions about
causal relationships, but as you’ll see, that doesn’t stop people from making such
con-clusions anyway In this section we talk about three methods commonly used to collect
data in an effort to answer questions about causality: anecdotes, observational studies,
and controlled experiments
Most questions about causality can be understood in terms of two variables: the
treatment variable and the outcome variable (The outcome variable is also
some-times called the response variable, because it responds to changes in the treatment.)
We are essentially asking whether the treatment variable causes changes in the
out-come variable For example, the treatment variable might record whether or not a
person drinks Peanut Milk, and the outcome variable might record whether or not that
person’s gum disease improved Or the treatment variable might record whether or not
a person is generally happy, and the outcome variable might record whether or not that
person suffered from heart disease in a ten-year period
People who receive the treatment of interest (or have the characteristic of interest)
are said to be in the treatment group Those who do not receive that treatment (or
do not have that characteristic) are in the comparison group, which is also called the
control group.
Anecdotes
Peanut milk is a drink invented by Jack Chang, an entrepreneur in San Francisco,
California He noticed that after he drank peanut milk for a few months, he stopped
los-ing hair and his gum disease went away Accordlos-ing to the Los Angeles Times (Glionna
2006), another regular drinker of peanut milk says that the beverage caused his cancer
to go into remission Others have reported that drinking the beverage has reduced the
severity of their colds, has helped them sleep, and has helped them wake up
This is exciting stuff! Peanut milk could very well be something we should all be
drinking But can peanut milk really solve such a wide variety of problems? On the
face of it, it seems that there’s evidence that peanut milk has cured people of illness
The Los Angeles Times reports the names of people who claim that it has However,
the truth is that this is simply not enough evidence to justify any conclusion about
whether the beverage is helpful, harmful, or without any effect at all
These testimonials are examples of anecdotes An anecdote is essentially a
story that someone tells about her or his own (or a friend’s or relative’s) experience
Anecdotes are an important type of evidence in criminal justice because eyewitness
testimony can carry a great deal of weight in a criminal investigation However, for
answering questions about groups of people with great variability or diversity,
anec-dotes are essentially worthless
The primary reason why anecdotes are not useful for reaching conclusions about
cause-and-effect relationships is that the most interesting things that we study have so
SECTION 1.4
Collecting Data to Understand Causality