1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Elementary statistics looking at the big picture

695 100 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 695
Dung lượng 48,34 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

438 Summary 451 / Exercises 454 10 Inference for a Single Quantitative Variable 461 10.1 Inference for a Mean When Population Standard Deviation Is Known or Sample Size Is Large.. 558 Su

Trang 2

How do we know this text’s exercises are perfectly adapted

for online learning with Enhanced WebAssign?

The text author wrote them.

Enhanced WebAssignfor Elementary Statistics: Looking at the

Big Picture is an easy-to-use online teaching and learning system

that provides assignable homework, automatic grading, and

interactive assistance for students With more than 1,000 exercises

pulled directly from the text—written and customized by Nancy

Pfenning to be ideal for the online environment—students get

problem-solving practice that clarifi es statistics, builds skills, and

boosts conceptual understanding And when you choose Enhanced

WebAssign, students also get access to a Multimedia eBook, a

complete interactive version of the text

Students Get Interactive Practice

As students work problems, they can link directly to:

Watch It—Videos of worked exercises and examples from the text Read It—Relevant eBook selections from the text

You Save Time on Homework Management, Including Automatic Grading Enhanced WebAssign’s simple, user-friendly interface lets you quickly master the essential functions—and help is always available if you need

it Create a course in two easy steps, enroll students quickly (or let them

enroll themselves), and select problems for an assignment in fewer than

fi ve minutes Enhanced WebAssign automatically grades the assignments and sends results to your gradebook It’s that easy!

Find out more and see a sample assignment at www.webassign.net/brookscole

Screenshots shown here are for illustrative purposes only.

Trang 3

Statistics

Trang 6

Looking at the Big Picture

Nancy Pfenning

Publisher: Richard Stratton

Senior Sponsoring Editor: Molly Taylor

Associate Editor: Daniel Seibert

Editorial Assistant: Shaylin Walsh

Senior Marketing Manager: Greta Kleinert

Marketing Coordinator: Erica O’Connell

Marketing Communications Manager:

Mary Anne Payumo

Content Project Manager: Susan Miscio

Art Director: Linda Helcher

Senior Print Buyer: Diane Gibbons

Senior Rights Acquisition Account Manager,

Text: Katie Huha

Production Service: S4Carlisle Publishing

Services

Rights Acquisition Account Manager, Images:

Don Schlotman

Photo Researcher: Jennifer Lim

Interior and Cover Designer: KeDesign

Cover Image: © Veer Incorporated

Compositor: S4Carlisle Publishing Services

ALL RIGHTS RESERVED No part of this work covered by the copyright herein may be reproduced, transmitted, stored, or used in any form or by any means graphic, electronic, or mechanical, including but not limited to photocopying, recording, scanning, digitizing, taping, Web distribution, information networks, or information storage and retrieval systems, except

as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the publisher.

Library of Congress Control Number: 2009935400 ISBN-13: 978-0-495-01652-6

ISBN-10: 0-495-01652-7

Brooks/Cole

20 Channel Center Street Boston, MA 02210 USA

Cengage Learning is a leading provider of customized learning solutions with office locations around the globe, including Singapore, the United Kingdom, Australia, Mexico, Brazil, and Japan Locate your local office at:

Purchase any of our products at your local college store or at our preferred

online store www.ichapters.com

Printed in the United States of America

1 2 3 4 5 6 7 12 11 10 09

For product information and technology assistance, contact us at

Cengage Learning Customer & Sales Support, 1-800-354-9706

For permission to use material from this text or product, submit all

requests online at www.cengage.com/permissions.

Further permissions questions can be emailed to

permissionrequest@cengage.com.

Trang 7

To Frank, Andreas & Mary, Marina, and Nils

Trang 9

Preface xv

1 Introduction: Variables and Processes in Statistics 1

Types of Variables: Categorical or Quantitative 2

Students Talk Stats:Identifying Types of Variables 3

Handling Data for Two Types of Variables 5

Roles of Variables: Explanatory or Response 7

Statistics as a Four-Stage Process 9

Summary 11 / Exercises 11 PA R T I Data Production 16 2 Sampling: Which Individuals Are Studied 18 Sources of Bias in Sampling: When Selected Individuals Are Not Representative 18

Probability Sampling Plans: Relying on Randomness 20

The Role of Sample Size: Bigger Is Better If the Sample Is Representative 21

From Sample to Population: To What Extent Can We Generalize? 22

Students Talk Stats:Seeking a Representative Sample 23

Summary 25 / Exercises 25 3 Design: How Individuals Are Studied 30 3.1 Various Designs for Studying Variables 30

Identifying Study Design 32

Observational Studies versus Experiments: Who Controls the Variables? 33

Errors in Studies’ Conclusions: The Imperfect Nature of Statistical Studies 35 3.2 Sample Surveys: When Individuals Report Their Own Values 38

Sources of Bias in Sample Surveys 38

3.3 Observational Studies: When Nature Takes Its Course 46

Confounding Variables and Causation 46

Paired or Two-Sample Studies 48

Prospective or Retrospective Studies: Forward or Backward in Time 49

3.4 Experiments: When Researchers Take Control 51

Randomized Controlled Experiments 52

Double-Blind Experiments 53

“Blind” Subjects 53

“Blind” Experimenters 54

Pitfalls in Experimentation 55

Modifications to Randomization 57

Contents

Trang 10

Students Talk Stats:Does Watching TV Cause ADHD? Considering

Study Design 63

Summary 63 / Exercises 65 PA R T I I Displaying and Summarizing Data 70 4 Displaying and Summarizing Data for a Single Variable 72 4.1 Single Categorical Variable 72

Summaries and Pie Charts 72

The Role of Sample Size: Why Some Proportions Tell Us More Than Others Do 74

Bar Graphs: Another Way to Visualize Categorical Data 75

Mode and Majority: The Value That Dominates 77

Revisiting Two Types of Bias 77

Students Talk Stats:Biased Sample, Biased Assessment 78

4.2 Single Quantitative Variables and the Shape of a Distribution 82

Thinking about Quantitative Data 83

Stemplots: A Detailed Picture of Number Values 85

Histograms: A More General Picture of Number Values 89

4.3 Center and Spread: What’s Typical for Quantitative Values, and How They Vary 93

Five-Number Summary: Landmark Values for Center and Spread 93

Boxplots: Depicting the Key Number Values 95

Mean and Standard Deviation: Center and Spread in a Nutshell 98

4.4 Normal Distributions: The Shape of Things to Come 108

The 68-95-99.7 Rule for Samples: What’s “Normal” for a Data Set 110

From a Histogram to a Smooth Curve 113

Standardizing Values of Normal Variables: Storing Information in the Letter z 114

Students Talk Stats:When the 68-95-99.7 Rule Does Not Apply 117

“Unstandardizing” z-Scores: Back to Original Units 118

The Normal Table: A Precursor to Software 119

Summary 125 / Exercises 127 5 Displaying and Summarizing Relationships 133 5.1 Relationships between One Categorical and One Quantitative Variable 133

Different Approaches for Different Study Designs 133

Displays 134

Summaries 134

Notation 134

Data from a Two-Sample Design 134

Data from a Several-Sample Design 137

Data from a Paired Design 138

Students Talk Stats:Displaying and Summarizing Paired Data 139

Generalizing from Samples to Populations: The Role of Spreads 141

The Role of Sample Size: When Differences Have More Impact 143

5.2 Relationships between Two Categorical Variables 150

Summaries and Displays: Two-Way Tables, Conditional Percentages, and Bar Graphs 151

The Role of Sample Size: Larger Samples Let Us Rule Out Chance 156

Trang 11

Comparing Observed and Expected Counts 156

Confounding Variables and Simpson’s Paradox: Is the Relationship Really There? 157

5.3 Relationships between Two Quantitative Variables 165

Displays and Summaries: Scatterplots, Form, Direction, and Strength 166

Correlation: One Number for Direction and Strength 170

When the Correlation Is 0, ⫹1, or ⫺1 171

Correlation as a Measure of Direction and Strength 173

A Closer Look at Correlation 174

Correlation Is Unaffected by the Roles of Explanatory and Response Variables 175

Correlation Is Unaffected by Units of Measurement 176

Least Squares Regression Line: What We See in a Linear Plot 177

A Closer Look at Least Squares Regression 182

Residuals: Prediction Errors in a Regression 182

Spread s about the Line versus Spread s yabout the Mean Response 183

The Effect of Explanatory and Response Roles on the Regression Line 184 Influential Observations and Outliers 185

Students Talk Stats:How Outliers and Influential Observations Affect a Relationship 186

Sample versus Population: Thinking Beyond the Data at Hand 187

The Role of Sample Size: Larger Samples Get Us Closer to the Truth 188

Time Series: When Time Explains a Response 189

Additional Variables: Confounding Variables, Multiple Regression 191

Students Talk Stats:Confounding in a Relationship between Two Quantitative Variables 191

Summary 204 / Exercises 206 PA R T I I I Probability 224 6 Finding Probabilities 226 6.1 The Meaning of “Probability” and Basic Rules 226

Permissible Probabilities 229

Probabilities Summing to One 229

Probability of “Not” Happening 231

Probability of One “Or” the Other for Non-overlapping Events 231

Probability of One “And” the Other for Two Independent Events 233

6.2 More General Probability Rules and Conditional Probability 238

Probability of One “Or” the Other for Any Two Events 239

Probability of Both One “And” the Other Event Occurring 241

Students Talk Stats:Probability as a Weighted Average of Conditional Probabilities 245

Conditional Probability in Terms of Ordinary Probabilities 246

Checking for Independence 247

Counts Expected If Two Variables Are Independent 250

Summary 256 / Exercises 257 7 Random Variables 267 7.1 Discrete Random Variables 268

Probability Distributions of Discrete Random Variables 269

The Mean of a Random Variable 276

The Standard Deviation of a Random Variable 278

Rules for the Mean and Standard Deviation of a Random Variable 280

Trang 12

7.2 Binomial Random Variables 291

What Makes a Random Variable “Binomial”? 291

The Mean and Standard Deviation of Sample Proportions 295

Students Talk Stats:Calculating and Interpreting the Mean and Standard Deviation of Count or Proportion 297

The Shape of the Distribution of Counts or Proportions: The Central Limit Theorem 299

7.3 Continuous Random Variables and the Normal Distribution 311

Discrete versus Continuous Distributions 312

When a Random Variable Is Normal 315

The 68-95-99.7 Rule for Normal Random Variables 316

Standardizing and Unstandardizing: From Original Values to z or Vice Versa 319

Estimating z Probabilities with a Sketch of the 68-95-99.7 Rule 319

Nonstandard Normal Probabilities 323

Tails of the Normal Curve: The 90-95-98-99 Rule 326

Students Talk Stats:Means, Standard Deviations, and Below-Average Heights 329

Summary 335 / Exercises 337 8 Sampling Distributions 344 Categorical Variables: The Behavior of Sample Proportions 344

Quantitative Variables: The Behavior of Sample Means 345

8.1 The Behavior of Sample Proportion in Repeated Random Samples 346

Thinking about Proportions from Samples or Populations 346

Center, Spread, and Shape of the Distribution of Sample Proportion 348

8.2 The Behavior of Sample Mean in Repeated Random Samples 356

Thinking about Means from Samples or Populations 356

The Mean of the Distribution of Sample Mean 358

The Standard Deviation of the Distribution of Sample Mean 358

The Shape of the Distribution of Sample Mean: The Central Limit Theorem 360

Center, Spread, and Shape of the Distribution of Sample Mean 360

Normal Probabilities for Sample Means 362

Students Talk Stats:When Normal Approximations Are Appropriate 365

Summary 371 / Exercises 372 PA R T I V Statistical Inference 386 9 Inference for a Single Categorical Variable 388 9.1 Point Estimate and Confidence Interval: A Best Guess and a Range of Plausible Values for Population Proportion 390

Probability versus Confidence: Talking about Random Variables or Parameters 392

95% Confidence Intervals: Building around Our Point Estimate 394

The Role of Sample Size: Closing In on the Truth 398

Confidence at Other Levels 400

Deciding If a Particular Value Is Plausible: An Informal Approach 403

The Meaning of a Confidence Interval: What Exactly Have We Found? 404

Students Talk Stats:Interpreting a Confidence Interval 405

9.2 Hypothesis Test: Is a Proposed Population Proportion Plausible? 413

Three Forms of Alternative Hypothesis: Different Ways to Disagree 416

One-Sided or Two-Sided Alternative Hypothesis 425

Trang 13

How Small Is a “Small” P-Value? 429

The Role of Sample Size in Conclusions for Hypothesis Tests 430

When to Reject the Null Hypothesis: Three Contributing Factors 431

Students Talk Stats:Interpreting a P-Value 432

Type I or II Error: What Kind of Mistakes Can We Make? 433

Students Talk Stats:What Type of Error Was Made? 435

Relating Results of Test with Confidence Interval: Two Sides of the Same Coin 435

The Language of Hypothesis Tests: What Exactly Do We Conclude? 436

Students Talk Stats:The Correct Interpretation of a Small P-Value 437

Students Talk Stats:The Correct Interpretation When a P-Value Is Not Small 437

The “Critical Value” Approach: Focusing on the Standard Score 438

Summary 451 / Exercises 454 10 Inference for a Single Quantitative Variable 461 10.1 Inference for a Mean When Population Standard Deviation Is Known or Sample Size Is Large 462

A Confidence Interval for the Population Mean Based on z 464

95% Confidence Intervals with z 465

Students Talk Stats:Confidence Interval for a Mean: Width, Margin of Error, Standard Deviation, and Standard Error 469

Role of Sample Size: Larger Samples, Narrower Intervals 471

Intervals at Other Levels of Confidence with z 472

Interpreting a Confidence Interval for the Mean 473

Students Talk Stats:Correctly Interpreting a Confidence Interval for the Mean 473

A z Hypothesis Test about the Population Mean 474

10.2 Inference for a Mean When the Population Standard Deviation Is Unknown and the Sample Size Is Small 480

A t Confidence Interval for the Population Mean 482

95% Confidence Intervals with t 482

Intervals at Other Levels of Confidence with t 484

A t Hypothesis Test about the Population Mean 486

Students Talk Stats:Practical Application of a t Test 488

10.3 A Closer Look at Inference for Means 491

A One-Sided or Two-Sided Alternative Hypothesis about a Mean 491

The Role of Sample Size and Spread: What Leads to Small P-Values? 493

Type I and II Errors: Mistakes in Conclusions about Means 494

Relating Tests and Confidence Intervals for Means 495

Correct Language in Hypothesis Test Conclusions about a Mean 496

Robustness of Procedures 498

Summary 503 / Exercises 505 11 Inference for Relationships between Categorical and Quantitative Variables 520 11.1 Inference for a Paired Design with t 522

Hypothesis Test in a Paired Design 522

Confidence Interval in a Paired Design 524

11.2 Inference for a Two-Sample Design with t 528

The Two-Sample t Distribution and Test Statistic 528

Hypothesis Test in a Two-Sample Design 530

Confidence Interval in a Two-Sample Design 534

The Pooled Two-Sample t Procedure 536

Students Talk Stats:Ordinary versus Pooled Two-Sample t 537

Trang 14

11.3 Inference for a Several-Sample Design with F: Analysis of Variance 543

The F Statistic 545

The F Distribution 550

Solving Several-Sample Problems 552

The ANOVA Table: Organizing What We Know about F 555

The ANOVA Alternative Hypothesis 557

Assumptions of ANOVA 558

Summary 566 Students Talk Stats:Reviewing Relationships between Categorical Explanatory and Quantitative Response Variables 566

Exercises 571 12 Inference for Relationships between Two Categorical Variables 591 12.1 Comparing Proportions with a z Test 592

12.2 Comparing Counts with a Chi-Square Test 598

Relating Chi-Square to z 598

The Table of Expected Counts 599

Comparing Observed to Expected Counts 600

The Chi-Square Distribution 602

The Chi-Square Test 604

Sample Size and Chi-Square Assumptions 604

Summary 613 / Exercises 614 13 Inference for Relationships between Two Quantitative Variables 628 13.1 Inference for Regression: Focus on the Slope of the Regression Line 629

Setting the Stage: Summarizing a Relationship for Sampled Points 630

Distinguishing between Sample and Population Relationships 631

A Model for the Relationship between Two Quantitative Variables in a Population 634

The Distribution of Sample Slope b1 636

The Distribution of Standardized Sample Slope t 637

Hypothesis Test about the Population Slope with t: A Clue about the Relationship 638

Students Talk Stats:No Evidence of a Relationship 643

Confidence Interval for the Slope of the Population Regression Line 644

13.2 Interval Estimates for an Individual or Mean Response 651

Summary 662 / Exercises 664 14 How Statistics Problems Fit into the Big Picture 677 14.1 The Big Picture in Problem Solving 677 Students Talk Stats:Choosing the Appropriate Statistical Tools: Question 1 678

Students Talk Stats:Choosing the Appropriate Statistical Tools: Question 2 679

Students Talk Stats:Choosing the Appropriate Statistical Tools: Question 3 680

Exercises 683

15.1 The Sign Test as an Alternative to the Paired t Test 15.2 The Rank-Sum Test as an Alternative to the Two-Sample t Test

Wilcoxon rank-sum test

15.3 Summary of Non-parametrics Exercises

Trang 15

16 Two-Way ANOVA (available online)

Trang 16

Data Production

Part I

16

1 Obtaining a sample

2 Designing a study to discover what we want to know about the variables

of interest for the individuals in the sample

An Overview

In this part of the book, we focus on the two stages of data production:The principles of good data production play a vital role in what we aim to ac-complish throughout the book It is of the utmost importance at this stage to avoid

any form of bias.

Bias Due to Sampling

In an interview, Larry Flynt (controversial publisher of Hustler and similar

maga-zines) was asked, “How would you like women to remember you—as someonewho helped or hurt their position?”1His reply was “ of the thousands of girlswho have posed for my magazines, I’ve never had one who felt she had been ex-ploited I think it’s actually helped the women’s movement .” Obviously, the

statistics, the most common quantities to be estimated are means andproportions

Bias is the tendency of an estimate to deviate in one direction from a

true value

A biased sample results in over- or underestimates because the sample

is not representative of the population of interest

The design of a study is the plan for gathering information about the variables of interest A biased study design results in over- or

underestimates because of flaws in the way information about sampledindividuals is gathered

Part IV A study design

that assesses sampled

values without bias is a

Trang 17

in general, so we cannot infer anything about the attitude of the larger population

of women based on his sample of models

Thus, it is extremely important that the very first step in data production—

sampling—be carried out in such a way that the sample really does represent the

population of interest Also, we must remember that our summaries of variables

and their relationships reflect the true nature of the variables and relationships in

the sample only if the design for gathering the information is sound.

Bias Due to Study Design

According to an article entitle “Exercise Does Good Things for Teens’ Moods,”

“Boys who reported less than an hour of vigorous physical activity a week were

more likely to be depressed and withdrawn than those who exercised regularly.”

The design for assessing the boys’ physical activity and mood was to simply observe

the values for these variables as they naturally occurred For this reason, we can’t

rule out a very different explanation for what the researchers observed in their

sam-ple of boys: Perhaps being in a good mood makes a teenager more likely to exercise

Good data production is an essential part of the “big picture” of statistics We

must keep its principles in mind as we progress later on in the book to

summariz-ing data, understandsummariz-ing probability, and performsummariz-ing statistical inference

Throughout this part of the book, we will establish guidelines for ideal

produc-tion of data It is important for us to strive to achieve these standards Realistically,

however, it is rarely possible to carry out a study that is completely free of flaws

Therefore, we must frequently apply common sense to decide which imperfections

we can “live with,” and which ones could completely undermine a study’s results

17

1 Data Production: Take sample data from the

population, with sampling and study designs that avoid bias

2 Displaying and Summarizing:

Use appropriate displays and summaries of the sample data, according to variable types and roles

3 Probability: Assume we know

what’s true for the population;

how should random samples

behave?

4 Statistical Inference: Assume we only know what’s

true about sampled values of a single variable or

relationship; what can we infer about the larger

population?

Trang 18

Displaying and Summarizing Data: An Overview

Before going into detail about the two steps in data production—sampling

and design—we discussed the fact that the way we handle statistical lems depends on the number and type of variables involved We either have

prob-a single cprob-ategoricprob-al vprob-ariprob-able, prob-a single quprob-antitprob-ative vprob-ariprob-able, or prob-a relprob-ationshipbetween, respectively, a categorical and a quantitative variable, two categorical vari-ables, or two quantitative variables Categorical variables are summarized by tellingcounts, proportions, or percents in the category of interest, whereas quantitativevariables are often summarized by reporting the mean Whenever we are interested

in the relationship between two variables, it is important to establish which (if any)plays the role of explanatory variable and which is the response The roles played

by the variables will determine which displays and summaries are appropriate.Once we establish what is true about a variable or relationship in a random

sample, we will be in a position to say something about what is true for the larger population Throughout this book, we must take care to distinguish between sam-

ples and populations

Displaying and Summarizing

Data

Part II

70

Definitions A number that summarizes a sample is called a statistic.

A number that summarizes the population is called a parameter.

The most common statistics of interest are the sample proportion (called

“p-hat”) and the sample mean (called “x-bar”), corresponding to the

param-eters population proportion p and population mean  (called “mu”) These will

be formally defined as we encounter them in Chapter 4

Identifying Statistics and Parameters

Here are some situations featuring either statistics or parameters

쮿 19% of 2,366 surveyed Americans said they believed money can buy happiness

x

pN

Trang 19

Results of a survey taken by several hundred students in introductory

statis-tics classes at a particular university provide a good source of real-life examples

corresponding to each of the 5 variable situations, from one categorical variable

to two quantitative variables These students reported their age, whether or not

they’d eaten breakfast that day, how many minutes they spent on the computer the

day before, and so on To gain experience in working with real data, we will

of-ten produce displays and summaries, and later perform statistical inference, using

this data set Because our summaries of the survey data correspond to a sample,

we will treat those summaries as statistics, not parameters

portion of all Americans who believe money can buy happiness is a parameter p.

쮿 A New York Times article entitled “The DNA 200” reports that the first 200

inmates to be cleared through DNA evidence, from January 1989 to April

2007, averaged 12 years in prison.1

Here the number 12 is a parameter m because it is talking about the mean years

for the population of all 200 inmates exonerated thus far

Keeping the Big Picture in Perspective

In Part I, we learned about good sampling technique, to ensure that the sample

truly represents the larger population about which we want to draw conclusions

We also learned how to design good studies so that the information obtained

about the variables or relationships accurately reflects the truth about the sampled

individuals Adhering to good principles of sampling and design is vital for the

the-ory developed in Part III, when we assume a population parameter is known, and

learn how the corresponding sample statistic behaves The behavior is predictable

only if the statistic summarizes data values that are unbiased The same principles

continue to be essential for the more practical techniques learned in Part IV, when

we use sample statistics to draw conclusions about unknown population

parame-ters Again, those conclusions will be correct only if the statistic is unbiased

Keeping in mind that the sampling technique and study design could have an

impact on the data that are produced, we undertake in Part II to summarize data

gathered about single variables and about relationships In other words, we will

now learn how to find relevant sample statistics for the data at hand The

follow-ing diagram shows how summarizfollow-ing data fits into the “big picture” of statistics

1 Data Production: Take sample data from the

population, with sampling and study designs that avoid bias

2 Displaying and Summarizing:

Use appropriate displays and summaries of the sample data, according to variable types and roles

3 Probability: Assume we know

what’s true for the population;

how should random samples

behave?

4 Statistical Inference: Assume we only know what’s

true about sampled values of a single variable or

relationship; what can we infer about the larger

71

Trang 20

Introduction to Probability

Our ultimate goal in this book is to perform statistical inference: Use a

sample statistic (such as sample mean or sample proportion) to drawconclusions about an unknown population parameter (like populationmean or population proportion)

Political polls provide a straightforward example of the kind of reasoning volved in performing statistical inference First, keeping in mind principles estab-lished in Part I, researchers would design and implement a survey to poll peopleabout their views before a presidential election Methods of Part II would indicatethat the results (categorical) could be summarized with a percentage Suppose that

in-54% in the sample of 1,000 voters intend to vote for a particular candidate, and

we would like to decide whether or not the majority—more than 50%—of all

vot-ers intend to vote for that candidate

Probability

Part III

224

1 Data Production: Take sample data from the

population, with sampling and study designs that avoid bias

2 Displaying and Summarizing:

Use appropriate displays and summaries of the sample data, according to variable types and roles

3 Probability: Assume we know

what’s true for the population;

how should random samples

behave?

4 Statistical Inference: Assume we only know what’s

true about sampled values of a single variable or

relationship; what can we infer about the larger

population?

Trang 21

50% (no more) of all voters favor that candidate Then we would determine how

probable or improbable it would be to find as many as 54%, in a random sample

of 1,000 voters, intending to vote for that candidate If it turns out to be extremely

unlikely to get a sample percentage as high as 54% when the population

percent-age is only 50%, then we’d conclude that the population percentpercent-age is not so low

as 50% It is almost certainly more

The key to making a decision in our election example is finding the likelihood

(or unlikelihood) of obtaining a certain sample percentage, given a claimed

popu-lation percentage Thus, it is a probability that brings about our final decision

Re-ferring to our sketch of the “big picture,” we are ready now to tackle the third

major step in the four-step process of learning to perform statistical inference

By the end of Part III, we will have established the necessary theory to

evalu-ate probabilities like the one needed to solve the election example above This

the-ory is by no means simple, and must be developed gradually We will begin by

learning basic and more general rules of probability (the science), which is the

for-mal study of random behavior Next, we learn about the behavior of random

vari-ables, which are a particular kind of quantitative variable whose values are a

result of some random process (such as random sampling) This leads to the

chap-ter on sampling distributions, which tell the behavior of two random variables of

particular interest—sample proportion and sample mean By this time we will be

able to determine, for a given population parameter, how the corresponding

sta-tistic behaves in the long run for random samples This sets the stage for inference

in Part IV, when we turn this knowledge around, and for a given statistic (such as

sample proportion), determine what should be true about the corresponding

pa-rameter (such as unknown population proportion)

Now that we are about to begin our formal study of random behavior—the

science of probability—it is a good time to remind ourselves of the importance of

techniques learned in Part I, on data production Randomization was the key to

producing unbiased samples for observational studies, and the key to establishing

causation in experiments Now we should take note of the fact that the entire

the-ory of probability developed in Part III, on which the applications in Part IV

de-pend, requires that selections or assignments have been made at random.

In Part II, when we learned various display and summary techniques, we

com-partmentalized the topics according to number and type of variables involved

There were five basic situations, as illustrated in the diagram below: one

categor-ical variable, one quantitative variable, one each categorcategor-ical and quantitative, two

categorical variables, and two quantitative variables

In Part IV, when we learn to draw conclusions about the larger population,

based on sample data, we will again handle one situation at a time, depending on

number and type of variables Now, in Part III, there will occasionally be subtle

shifts from one to two categorical variables, or from quantitative to categorical

variables and vice versa Instead of focusing on number and type of variables, we

concentrate, especially in Chapter 6, on the general rules that govern random

be-havior in any of these five situations.

225

Two quantitative variables

Q→Q

Two categorical variables

C →C

One categorical and one quantitative variable

One quantitative variable

Trang 22

Statistical Inference: An Overview

Whether or not we state it explicitly, whenever information is gathered

about a group of individuals, we almost always want to generalize

to a larger group A poll finds what proportion of surveyed votersfavor a particular candidate, to get an idea of what proportion of

all voters favor that candidate An experimenter determines how much more

weight is lost by some dieters who exercise, compared to some dieters who don’t,

to draw conclusions about weight loss by all dieters who do or don’t exercise.

Most people, even if they have never taken a statistics course, are not so naive

as to believe that what is true for a sample must also be exactly true for the largerpopulation But unless they have a knowledge of statistical principles, people areunable to judge to what degree information about a sample can be extended to thegeneral population This book teaches you to be an educated consumer of statis-tical information, so that by the time you have finished this final (and most impor-tant) part, you will have the skills to make such generalizations carefully andcorrectly These skills will enable you to decide, given poll results, whether or not

a majority of all voters favor a candidate They will let you estimate, given results

of an experiment, how many more pounds any dieter stands to lose if he or she

exercises regularly

Inference in the Big Picture

Our diagram of the four processes should help remind you of how this fourth andfinal process fits into the “big picture” of statistics

By now we have considered how to produce an unbiased sample, and how todisplay and summarize the sample data, depending on what types of variables areinvolved We have established important principles of probability theory, and areready to make practical use of these results: Now that we know how samples tend

to behave relative to populations, we turn this knowledge around and discoverwhat is likely to be true about a population, given what we have observed in asample Our knowledge about the population, based on the sample, will not beperfect, but methods about to be presented will enable us to quantify the uncer-

Statistical Inference

Trang 23

tainty of our conclusions This final step, inference, is highlighted in our diagram

because it is the task at hand The five variable situations are also shown because

each situation calls for a different approach to inference

Two Major Forms of Inference

No matter which of the five situations applies, our inference about the larger

pop-ulation, based on the sample, may take one of two forms: confidence intervals or

hypothesis tests

쮿 Setting up a confidence interval is a way of presenting a range of plausible

val-ues for the unknown population parameter The interval tells us what valval-ues

are, in a sense, believable

쮿 Carrying out a hypothesis test is a way of deciding whether or not a

particu-lar proposed value for the unknown parameter is plausible In the case of

re-lationships between two variables, a hypothesis test is especially important

because it helps us decide whether or not there is convincing evidence that

those variables are related in the larger population, not just in the sample

In the next five chapters we will systematically consider both forms of inference—

confidence intervals and hypothesis tests—for each of the five variable situations As

you advance through these chapters, you may want to refer back to this overview

occasionally, to help keep the “big picture” in perspective throughout

387

population, with sampling and study designs that avoid bias

2 Displaying and Summarizing:

Use appropriate displays and summaries of the sample data, according to variable types and roles

3 Probability: Assume we know

what’s true for the population;

how should random samples

4 Statistical Inference: Assume we only know what’s

true about sampled values of a single variable or

relationship; what can we infer about the larger

population?

Trang 24

Before the semester starts, a statistics teacher wants to organize a box of

hundreds of newspaper clippings and Internet reports collected in the past

couple of years:

쮿 “Dark Chocolate Might Reduce Blood Pressure”

쮿 “Almost Half of U.S Internet Users ‘Google’ Themselves”

쮿 “Vampire Bat Saliva Researched for Stroke”

쮿 “Environmental Mercury, Autism Linked by New Research”

There are several reports on smoking and on obesity, but for most of the topics—

such as bat saliva—there is only one article How can the teacher sort all of those

articles in a way that will make them easy to access for future reference?

At the end of the semester, a group of statistics students are studying together,

trying to solve practice final exam problems such as these:

쮿 Suppose systolic blood pressures for 7 patients who ate dark chocolate

daily for two weeks dropped an average of 5 points, whereas those of a

con-trol group of 6 patients who ate white chocolate remained unchanged If

the standardized difference between blood pressure decreases was 2.1, do

we have convincing evidence that dark chocolate is beneficial?

쮿 According to a 2007 report, 47% of 1,623 U.S Internet users surveyed by

the Pew Internet & American Life Project had searched for information

about themselves online Give a 95% confidence interval for the

percent-age of all U.S Internet users who searched online for information about

Introduction:

Variables and Processes in Statistics

What can you accomplish with this book, and how?

Trang 25

쮿 Researchers found that 9 out of 15 stroke patients receiving vampire batsaliva had an excellent recovery, compared with 4 out of 17 who wereuntreated Does this provide evidence that bat saliva is effective in treat-ing stroke patients?

쮿 Research in a large sample of Texas school districts found that for every1,000 pounds of environmentally released mercury, there was a 17% in-crease in autism rates If one district has 300 additional pounds of environ-mental mercury compared to another, how much higher do we predict itsautism rate to be?

The students may feel overwhelmed in trying to find the right approach toeach of the problems, after having learned a whole semester’s worth of various sta-tistical procedures How can the students figure out which procedure is the rightone for each problem?

The answer for both teacher and students is a simple one, and it will also bethe key for you to understand what this book is all about, from beginning to end

The way we handle statistical problems depends on the number and types of ables involved.

vari-A variable, as the name suggests, is something that varies for different

individ-uals: Blood pressure is a variable because it takes different values for different ple; recovery from a stroke is a variable because some patients have an excellentrecovery and others do not The individuals with variable traits in many cases arepeople, but individuals can be anything that we are interested in—from penguins

peo-to school districts peo-to planets

Types of Variables: Categorical or Quantitative

Virtually all of the situations encountered in this book will involve either a single

variable or the relationship between two variables A variable’s type is categorical

if it takes qualitative values such as sex, race, or the response to a yes-or-no

ques-tion The type is quantitative if the variable takes number values for which

arith-metic makes sense, such as age, number of siblings, or rating something on a scale

of 1 to 10

Definitions A variable is a characteristic that differs for different individuals A categorical variable takes qualitative values that are not subject to the laws of arithmetic A quantitative variable takes number values for which arithmetic makes sense A relationship (also known as an

association) exists between two variables if certain values of one tend to

occur with certain values of the other

The statistics teacher can divide the clippings into just five piles:

1 One categorical variable

2 One quantitative variable

3 One categorical variable and one quantitative variable

4 Two categorical variables

5 Two quantitative variables

Likewise, the statistics students just need to identify the number and type ofvariables involved in each problem, and this will suggest what statistical proce-dure should be applied

This book features “Students Talk Stats” examples and exercises that are cussions by four prototypical students, highlighting many of the most important

dis-Categorical variables are

variables, like ZIP codes,

are categorical if the

numbers are labels, not

signifying an amount

that can be quantified

For example, if half of a

group of students have

a ZIP code 15217 and

the other half 15213, we

can’t say that the typical

ZIP code is the average

of these, 15215

A C LOSER

L OOK

Trang 26

ideas in statistics As you gradually rise to higher levels of understanding of

statis-tical concepts and procedures, you may find you can relate to their struggles and

discoveries Our first such discussion will help you begin to develop the skill of

identifying what types of variables are involved when you are presented with any

report containing statistical information

Identifying Types of Variables

Four students who have recently enrolled in astatistics class are browsing through news articles onthe Internet, thinking about what kind of variables areinvolved

Adam: “I’m in the mood for chocolate, so I’m looking at this

article that says ‘Dark Chocolate Might Reduce Blood Pressure’

I’m pretty sure blood pressure is quantitative but couldn’t chocolate go either way?”

Brittany: “Realistically, I think they’d just compare people who do and don’t eat dark

chocolate, which would make it categorical Here’s one that says ‘Almost Half of U.S

Internet Users “Google” Themselves’ Half is a number so it’s quantitative.”

Carlos: “Half is talking about the overall fraction, but for each person, they just

recorded whether or not they Googled themselves, so it’s categorical What about

‘Vampire Bat Saliva Researched for Stroke’? I picture they handled bat saliva like

Brittany said they’d handle chocolate—some people get it and others don’t I don’t

think it would be easy to put a number on recovery from a stroke, so that variable’s

probably categorical, too.”

Dominique: “I’m confused about this one: ‘Environmental Mercury, Autism Linked

by New Research’ Mercury would be quantitative, and I think of autism as being

categorical, but the report says they looked at autism rates in different school districts

depending on how much mercury was in the area Would that make autism

quantitative?”

Adam is correct that blood pressure is quantitative, and Brittany rightly guesses that

chocolate consumption in this case would be categorical Carlos has correctly

identified Googling one’s self as a categorical variable in the second article, and is

on the right track that both bat saliva and stroke recovery would be categorical

Finally, although autism for individual people would be categorical, if a study

considers autism rates for a sample of school districts, then the variable is

quantitative Dominique is right about both mercury and autism rate being

quantitative variables in this study

Practice: Try Exercise 1.2 on page 11.

Although variable type is usually fairly straightforward to identify, some

“crossover” from one type to the other may take place, such as in the autism/

mercury study discussed above by the four students, as well as in the following

pair of examples

Trang 27

E XAMPLE 1.1 When a Categorical Variable Gives Rise to a Quantitative Variable

Background:Individual teenagers were surveyed as to whether they haveused marijuana, and whether they have used harder drugs

Researchers then looked at the percentage of teenagers using marijuanaand the percentage using harder drugs in various countries around theworld to see if those two variables are related

Questions:What kinds of variables are involved in the first situation?What kinds of variables are involved in the second situation?

Responses:The first situation explores the relationship between twocategorical variables The second explores the relationship between twoquantitative variables

Practice: Try Exercise 1.6(a,b) on page 12.

Country % Marijuana % Harder Drugs

to analyze the data, they simply classified the babies as being below

6 pounds (considered below normal) or not, along with information aboutwhether the mothers had been X-rayed while pregnant

Questions:What kind of variables were involved in the first situation? Inthe second situation?

Trang 28

Handling Data for Two Types of Variables

We refer to recorded values of categorical or quantitative variables as data The

science of statistics is all about handling data.

Responses:The first situation involves one categorical variable (mother

had dental X-rays or not) and a quantitative variable (baby’s weight) The

second situation involves two categorical variables because babies’ weights

are now categorized into two groups

Practice: Try Exercise 1.8 on page 12.

Original variable (birth weight) is quantitative 5.5 5.8 5.8 6.0 6.2 6.3

Below normal Normal b b b n n n

Definitions Data are pieces of information about the values taken by

variables for a set of individuals

The science of statistics concerns itself with gathering data about a

group of individuals, displaying and summarizing the data, and using the

information provided by the data to draw conclusions about a larger

group of individuals

Before we go into detail about the process of gathering data, it helps to have

an idea of how we will handle the data when the time comes Categorical variables

are summarized by telling count or proportion or percentage in the category of

in-terest The most common way of summarizing quantitative variables is with their

mean (same thing as average), although we will discuss other useful summaries a

bit later in this book

Definitions The count in a category of interest is simply the number

of individuals in that category

The proportion in a category of interest is the number of individuals

in that category, divided by the total number of individuals considered

The percentage in a category of interest is the proportion (as a

decimal) multiplied by 100%

The mean of a set of values is their sum divided by the total number

of values

Students may be misled to think that the variable of interest in a situation is

quantitative because they see a number attached to it In fact, that number may be

a count or a proportion or a percentage summarizing values of a categorical

vari-able It may help to think about how data values are being recorded for each

indi-vidual in a sample in order to decide whether the variable of interest is categorical

or quantitative, as Carlos did in the four students’ discussion on page 3

Many real-life studies,including manydiscussed in this book,convert quantitativevariables to categorical

in order to simplifymatters

L OOKING

A HEAD

Trang 29

Most of the data that statisticians handle, and most of the data that we encounter

in our everyday lives, come from some subgroup, called a sample, as opposed to the entire group of interest, called the population Occasionally, we have access to infor- mation about the entire population, gathered via a census This was the case in Ex-

ample 1.4 about earnings of various demographic groups in the United States

E XAMPLE 1.3 Summarizing Categorical Variables

Background:An article entitled “New Test-Taking Skill: Working theSystem” reports: “Indeed, although only a tiny fraction—1.9%—ofstudents nationwide got special accommodations for the SAT, thepercentage jumps fivefold for students at New England prep schools At 20prominent Northeastern private schools, nearly one in 10 students receivedspecial treatment.”1

Question:What type of variable is featured here, and how is it summarized?

Response:For each student in the entire nation or in the private schoolsexamined, it is recorded whether or not the student was granted specialaccommodations in taking the SAT test This is a categorical variable,summarized by telling the percentage or proportion in the category ofinterest (receiving special accommodations)

Practice: Try Exercise 1.10 on page 12.

E XAMPLE 1.4 Summarizing Quantitative Variables

Background:An article entitled “Racial Gaps in Education Cause IncomeTiers” reports: “On average, a white man with a college diploma earnedabout $65,000 in 2001 Similarly educated white women made about 40%less, while black and Hispanic men earned 30% less .”2

Question:How would earnings for each group (such as white women orHispanic men) be summarized—with a mean or with a proportion?

Response:Earnings are a quantitative variable and could be summarizedfor each group with a mean, namely $65,000 for college-educated whitemen, and a mean that is less by 40% of $65,000 for college-educated whitewomen—that is, $39,000

Practice: Try Exercise 1.11 on page 12.

Definitions A sample is a subset taken from a larger group, and the larger group of interest is the population.

A census, according to Webster’s dictionary, is a “usually complete

enumeration of the population,” and we think of a census in general as asurvey intended to include all citizens in a given area When we talk about

“the Census,” we are referring to the U.S Census, conducted regularly

since 1790, and designed to gather more and more detailed informationabout America’s population

In cases like this, where

values of a quantitative

variable are being

compared for two or

more categorical

groups, a summary

occasionally quantifies

the differences by

reporting what percent

higher or lower another

mean is from the

original mean

A C LOSER

L OOK

Trang 30

Once census results are summarized, as in Example 1.4, there are no further

statistical procedures needed to draw conclusions about the “larger population.”

E XAMPLE 1.5 When Information Is Provided for an

Entire Population

Background:“Are Feeding Tubes Over-Prescribed?” describes a Harvard

Medical School study that “involved 1999 data from all 15,135 licensed

U.S nursing homes at the time.”3The study found that “one-third of U.S

nursing home patients in the final stages of Alzheimer’s and other forms of

dementia are given feeding tubes, despite evidence that the practice serves

no benefit and may even cause harm.” The variable of interest here is

whether or not nursing home patients in the final stages of Alzheimer’s or

other forms of dementia are given feeding tubes, a categorical variable that

is summarized with the proportion 1/3

Question:Why would it not be appropriate to generalize the study’s

results to a larger population?

Response:It is not possible to generalize this result to a larger group

because it already refers to patients in all nursing homes at the time, rather

than to a sample comprising a subset of those patients

Practice: Try Exercise 1.14 on page 13.

Roles of Variables: Explanatory or Response

By far the most interesting and useful statistical studies involve relationships

be-tween variables How we approach the data will depend on what roles the

vari-ables play in their relationship There are occasionally situations where two

variables have “equal footing” in the relationship, such as in a study of the

re-lationship between football teams’ rankings in offense and in defense For the

most part, however, one variable is thought to cause changes in, or at least to

ex-plain, values of the other: It is called the explanatory variable The other

vari-able is impacted by, or responds to, the first: It is called the response varivari-able A

more complicated relationship can involve more than one explanatory or

re-sponse variable

Definitions Causation exists between two variables if changes in values

of the first are actually responsible for changes in values of the second

The explanatory variable in a relationship between two variables is

the one that is presumed to impact the other variable, called the response

variable.

In the following diagram of the five possible situations introduced on page 2,

the last three involve a relationship The direction of the arrow goes from

explana-tory to response variable Because relatively few actual situations of interest

in-volve a quantitative explanatory and categorical response variable, and because

the analysis is fairly advanced compared to the others, we will not analyze such

situations in this book

Trang 31

Example 1.6 illustrates the five situations in a variety of contexts.

E XAMPLE 1.6 Identifying Variable Types and Roles

Background:Consider these headlines:

쮿 “Men Are Twice as Likely as Women to

쮿 “Smaller, Hungrier Mice”

쮿 “County’s Average Weekly Wages at

$811, Better Than U.S Average”

Questions:What type of variables areinvolved in each of these situations? If therelationship between two variables is ofinterest, which plays the role of explanatoryvariable and which is the response?

Responses:“Men Are Twice

as Likely as Women to Be Hit

by Lightning”: We consider two categorical variables—genderand whether or not a person is hit by lightning Gender would

be the explanatory variable and being hit by lightning or not isthe response The other way around wouldn’t make sense because being hit

by lightning could not have an impact on a person’s gender

“35% of Returning Troops Seek Mental Health Aid”:Whether or not a returning soldier seeks mental health aid is

a single categorical variable

“Do Oscar Winners Live Longer Than Less SuccessfulPeers?”: This involves a categorical explanatory variable—being an Oscar winner or not—and a quantitative responsevariable—length of life

“Smaller, Hungrier Mice”: This brief headline suggests arelationship between two quantitative variables: the size of amouse and its appetite Size apparently plays the role ofexplanatory variable, so that as size goes down, the amount

of food desired goes up

Q→Q

Two categorical variables

C →C

One categorical and one quantitative variable

One quantitative variable

Q

One categorical variable

Trang 32

“County’s Average Weekly Wages at $811, Better Than U.S.

Average”: This involves just one quantitative variable—

weekly wages If wages for one county had been compared

to those of another county, then there would have been anadditional categorical explanatory variable Comparing thiscounty’s wages to those of the United States in general is a different kind of

comparison, where the county residents may be thought of as a single

sample, coming from the larger population of U.S residents

Practice: Try Exercise 1.17 on page 13.

Q

Statistics as a Four-Stage Process

Before we begin to learn about the first stage in the process of statistical analysis,

we should consider how all the stages fit together to accomplish our overall goal

On page 5, we stated that, as a science, statistics is used to produce information

from a sample, summarize it, and then draw conclusions about the larger

popula-tion from which the sample came Those conclusions, known as statistical

infer-ence, can be reached only if we have some knowledge of the workings of random

behavior, which comes under the realm of the science of probability.

Definitions A random occurrence is one that happens by chance

alone, and not according to a preference or an attempted influence

Probability is the formal study of the likelihood or chance of

something occurring in a random situation In the context of statistics,

probability explores the behavior of random samples taken from a larger

population

Statistical inference is the scientific process of drawing conclusions

about a population based on information from a sample

Thus, our goal can be reached in four stages, which will be addressed one at

a time in the book’s four parts

1 Data production: How to select a representative sample, and how to

prop-erly assess values of variables for that sample

2 Displaying and summarizing data: Depicting and describing single

quan-titative or categorical variables of interest, or relationships between

vari-ables if there are two varivari-ables involved

3 Probability: The scientific process wherein we assume we actually know

what is true for the entire population, and conclude what is likely to be

true for a sample drawn at random from that population

4 Statistical inference: Using what we have discovered about the variables

of interest in a random sample to draw conclusions about those variables

for the larger population

It is easy for a student to lose sight of these long-term goals, as he or she

con-centrates on learning particular concepts and techniques Throughout the book,

the following diagram will help remind you of how each new topic fits into the

“big picture.” A reminder of variable types and roles is included because

aware-ness of the variables involved is always an important part of the statistical picture

Trang 33

1 Data Production: Take sample data from the

population, with sampling and study designs that avoid bias

2 Displaying and Summarizing:

Use appropriate displays and summaries of the sample data, according to variable types and roles

3 Probability: Assume we know

what’s true for the population;

how should random samples

behave?

4 Statistical Inference: Assume we only know what’s

true about sampled values of a single variable or

relationship; what can we infer about the larger

E XAMPLE 1.7 Identifying the Four Processes

Background: Consider the following situations:

쮿 A retail manager is asked to present some graphs and a brief report onher group’s sales over the past several months, broken down into varioustypes of merchandise

쮿 Before a bookstore’s owners make plans for extensive renovations, theywant to find out what customers already like about the store and whataspects are in need of change

쮿 A pharmaceutical company has carried out a study and determinedproportions of patients experiencing nausea for those who take a certainmedication and those who take a “dummy pill.” The company wants toknow what claims it can make about proportions of patients

experiencing nausea in the general population for those who take themedication compared to those who don’t

쮿 The proportion of all Americans who are of Hispanic origin is 0.13.We’d like to know how unlikely it would be to take a random sample of1,000 Americans and find only 0.06 to be Hispanic

Question:Which of the four processes is involved in each situation?

쮿 The final one is a probability problem because we seek the likelihood ofobtaining a certain proportion in our sample who are Hispanic

Practice: Try Exercise 1.23 on page 14.

Trang 34

C h a p t e r 1 S u m m a r y

Variables and Statistics

쮿 The science of statistics is concerned withgathering data, summarizing it, and using thatinformation to draw conclusions about a largerpopulation The latter process is known as

statistical inference.

쮿 A census gathers information about an entire

population rather than just a sample

쮿 When the relationship between two variables is

of interest, it should be determined which (if

any) plays the role of explanatory variable and which is the response variable.

쮿 A random occurrence is one that happens by chance alone, and probability is the formal

Characteristics that can differ from one

individual to another are called

vari-ables Variables can be either

categori-cal or quantitative In statistics, we studysingle variables or relationships be-

tween variables At times we merely cus on variables’ values for a specific set

fo-of individuals, called a sample More fo-

of-ten, our goal is to generalize to a larger

group, called the population.

쮿 Data are pieces of information about the values

taken by variables for a set of individuals

쮿 The five variable situations to be covered in this

book are:

1 Single categorical variable

2 Single quantitative variable

3 Categorical explanatory and quantitative

쮿 Categorical variables can be summarized with

counts, proportions, or percentages.

쮿 Quantitative variables can be summarized with

means.

쮿 If individuals studied are entire groups, the

percentage in a particular category for each

group can be treated as a quantitative variable

쮿 A quantitative variable can be converted into a

categorical variable by grouping into ranges of

values

C h a p t e r 1 E x e r c i s e s

*1.1 Students were asked to rate their instructor’s

preparation for class as being excellent,

good, or needs improvement Response to

this question is what types of variable—

quantitative or categorical?

*1.2 Suppose researchers want to investigate how

weight can affect blood pressure Tell what

types of variables each of these situationsinvolves

a Individuals’ weights and blood pressuresare recorded

b Individuals are classified as being normal

or overweight, and their blood pressuresare recorded

Note: Asterisked numbers indicate exercises whose answers are provided in the Solutions to Selected Exercises section, on page 689.

Trang 35

c Individuals are classified as having high

or low blood pressure, and their weights

are recorded in kilograms

d Individuals are assessed as having high or

low blood pressure, and as being normal

or overweight

1.3 Prospective subjects for a study had their

blood pressures recorded

a Is the variable of interest quantitative or

categorical?

b Would results best be summarized with a

mean or with a proportion?

1.4 Before the 2004 presidential election in the

United States, there was a great deal of

interest concerning public opinion of the

war in Iraq For each of the following

situations, tell what individuals are being

studied, what variable is of interest, and

whether the variable is categorical or

quantitative

a People around the world were surveyed

as to whether they approved or

disapproved of the Iraq war

b People in various countries were surveyed

as to whether they approved or

disapproved of the Iraq war For each

country, it was determined what percentage

of its people disapproved of the war

c The Guardian—a British newspaper—

reported that 8 of 10 countries surveyed

by leading newspapers (such as the

Guardian, Canada’s La Presse, and

Japan’s Asahi Shimbun) disapproved of

the Iraq war

1.5 Based on a survey of a few thousand people,

a newspaper reporter wants to draw

conclusions about how a country’s citizens

in general feel about the war in Iraq At this

point, is the reporter mainly concerned with

data production, displaying and

summarizing data, probability, or

performing statistical inference?

*1.6 For parts (a) and (b), tell who or what

individuals are being studied, identify the

variable of interest, and tell whether it is

categorical or quantitative; then answer the

question in part (c)

a Adults were surveyed as to whether they

were married, single, or divorced

b The New York Times reported, state by

state, the divorce rate per 1,000 married

adults in 2003 The lowest rate was inMassachusetts, with 5.7 divorces per1,000 married people, and the highestwas in Nevada, with 14.6 per 1,000

c Assume we have Census data on maritalstatus of people in the United States Arethose people considered to be a sample or

a population?

1.7 A New York Times reporter decides to

convey information about American divorcerates by including a map of the UnitedStates Each state is shaded from light todark depending on how high its divorce rate

is At this point, is the reporter mainlyconcerned with data production, displayingand summarizing data, probability, orperforming statistical inference?

*1.8 “Can Mom’s Drinking Lower Kids’ IQ?”examined the relationship between mothers’consumption of alcohol during pregnancyand their children’s IQs The mothers wereclassified as being abstainers (0 alcoholicdrinks per day), light drinkers (up to 0.5 perday), moderate drinkers (0.5 to 1 per day),

or heavy drinkers (more than 1 per day) Isalcohol consumption being treated as acategorical or a quantitative variable?1.9 An article reported costs of ski-lift tickets invarious resorts in a region as being less than

$20, $20 to $40, $40 to $50, or more than

$50 Is ticket price being treated as acategorical or a quantitative variable?

*1.10 A British survey reported in 2006 states:

“Nearly 40 percent of 106 students whoanswered questionnaires about theirattitudes said they couldn’t cope withouttheir cell phone.”4

a What type of variable is beingconsidered?

b How is the variable summarized?

*1.11 “In a study of 87 French and Swiss collegestudents, researchers gave half of themsunscreen with a protection factor of 10 andthe other half with a factor of 30 Thestudents, who weren’t told which lotion theyreceived, went on summer vacations andrecorded the amount of time they spent inthe sun Users of the stronger sunscreenspent 25% more time in the sun, mostlysunbathing, the study found students inthe study often waited until their skin turnedred before rushing to the shade.”5

Trang 36

a Is time spent in the sun being treated as

a quantitative or a categorical variable?

b How would researchers summarize time

spent in the sun for each group (those

with the stronger and those with the

weaker sunscreen)?

1.12 A newspaper article entitled “Teens Most

Likely to Have Sex at Home” notes that of

the sexually active teens surveyed in the

year 2000, “56% said they first had sex at

their family’s home or at the home of their

partner’s family.”6

a What is the variable of interest?

b Is the variable of interest quantitative or

categorical?

c How is the variable being summarized?

1.13 Based on results of a survey of sexually

active teenagers, sociologists would like to

be able to say whether or not a majority of

all sexually active teenagers first had sex at

their or their partner’s home At this point,

are the sociologists mainly concerned with

data production, displaying and

summarizing data, probability, or

performing statistical inference?

*1.14 The New York Times reports: “Three out of

four workers drove to their jobs by

themselves in 2006, according to another

finding by the Census Bureau.”7Should we

consider the workers studied to be a sample

or a population?

1.15 Mortality rates in the United States during

the 1980s and 1990s were studied by county,

race, gender, and income, with the following

results: “Asian-Americans, average per-capita

income of $21,566, have a life expectancy of

84.9 years Western American Indians,

$10,029, 72.7 years ”8Are these numbers

referring to samples or populations?

1.16 The American Association of Retired People

(AARP) conducted a survey in which it was

discovered that 63% of adult Americans

don’t want to live to be at least 100 On

average, those polled wanted to live to the

age of 91

a Should we consider the Americans

polled to be a sample or a population?

b There is a categorical variable of interest

in the survey; tell roughly how the

survey question was phrased to obtain

those responses

c There is a quantitative variable of interest

in the survey; tell roughly how the surveyquestion was phrased to obtain thoseresponses

*1.17 The New York Times reported on a study of

gadgets and appliances in American homes.For each of the following results, tell which

of the five variable situations is involved,choosing from the following:

쮿 C: single categorical variable

쮿 Q: single quantitative variable

쮿 C → Q: categorical explanatory variable

and quantitative response variable

쮿 C → C: categorical explanatory variable

and categorical response variable

qa For each of the 17 appliances studied, the

Times reported the percentage of

American homes in 2001 that had theappliance For example, microwaveovens were in 96% of the homes andanswering machines were in 78% of the

homes (1) C (2) Q (3) C → Q (4) C → C (5) Q → Q

b The study made a comparison ofpercentage owning each appliance in

2001 to the percentage owning theappliance in 1987 For example,microwave ovens were in 66% of thehomes in 1987 as opposed to 96% in

2001 Answering machines were in 10%

of the homes in 1987 as opposed to 78%

in 2001 (1) C (2) Q (3) C → Q (4) C →

C (5) Q → Q

c The study reported 2.5 television sets

owned per household in 2001 (1) C (2) Q (3) C → Q (4) C → C (5) Q → Q

1.18 The New York Times reported on a study of

gadgets and appliances in American homes.For each of the 17 appliances studied, it toldthe percentage of American homes in 2001that had the appliance For example,microwave ovens were in 96% of the homesand answering machines were in 78% of thehomes

a Who or what are the individuals beingstudied?

b What is the variable of interest?

c Is the variable of interest quantitative orcategorical?

Trang 37

(most unfavorable) to 10 (mostfavorable).

d Viewers’ ratings of the ad on a scale of 1

to 10 are recorded, along with theviewers’ age group as being youth, youngadult, middle-aged, or senior citizen.1.22 Television advertisers are trying to decidewhich of the approaches outlined inExercise 1.21 to use in an upcoming study

of age and response to an advertisement Atthis point, are they mainly concerned withdata production, displaying and

summarizing data, probability, orperforming statistical inference?

*1.23 A department head wants to investigate thequality of teaching of a professor who iscoming up for tenure Tell which of the fourprocesses (data production, displaying andsummarizing, probability, or statisticalinference) is involved in each of these stages:

a The department head considers whether

to simply ask students to rate variousaspects of the professor’s performance on

a 5-point scale, or whether to also askthem to write a paragraph describingtheir experience in that professor’s class

b A sample of students is surveyed, andscores on a 5-point scale are averaged foreach aspect of the professor’s

d Based on the responses of sampledstudents, the department head concludes

that the mean preparedness rating for all of

the professor’s students is higher than 4.0.1.24 Men’s Health magazine used data on body

mass index, back-surgery rates, usage ofgyms, etc to grade the quality of men’s

“abs” (abdominal muscles) in 60 citiesacross the country If each city was given arating between 0 and 4, such as 2.75 forPittsburgh, then how is the variable ofinterest being treated—as quantitative orcategorical?

1.25 Suppose Men’s Health magazine wants to

present the results of the survey described inExercise 1.24 in a way that is both appealing

1.19 The study that looked at prevalence of

various appliances in homes in 2001, as

described in Exercises 1.17 and 1.18, made a

comparison to the percentages for each

appliance in 1987 For example, microwave

ovens were in 66% of the homes in 1987 as

opposed to 96% in 2001 Answering

machines were in 10% of the homes in 1987

as opposed to 78% in 2001

a There are two variables involved; what is

the explanatory variable?

b Tell whether the explanatory variable is

quantitative or categorical

c What is the response variable?

d Tell whether the response variable is

quantitative or categorical

e In which year would you expect

percentages to be higher overall—1987

or 2001, or both the same?

1.20 The New York Times study of appliances

reported 2.5 television sets per household in

1.21 Suppose television advertisers want to know

if age plays a role in people’s response to a

rather unconventional ad that might be aired

during the next Super Bowl The ad is

shown to a variety of viewers Keeping in

mind that the explanatory variable is not

necessarily the first one mentioned, classify

each of the following possible approaches as

involving one of these relationships:

쮿 C → C: categorical explanatory variable

and categorical response variable

쮿 C → Q: categorical explanatory variable

and quantitative response variable

쮿 Q → C: quantitative explanatory variable

and categorical response variable

쮿 Q → Q: quantitative explanatory variable

and quantitative response variable

a They ask whether or not a viewer likes

the ad, and record his or her age

b They classify a viewer as being youth,

young adult, middle-aged, or senior

citizen, and whether or not he or she

likes the ad

c Viewers’ ages are recorded, along with

their rating of the ad on a scale of 1

Trang 38

and informative Is the magazine mainly

concerned with data production, displaying

and summarizing data, probability, or

performing statistical inference?

1.26 Anthropologists studied gender differences

in public restroom graffiti, noting whether

the graffiti occurred in a men’s or women’s

room, and classifying writings as being

competitive and derogatory or advisory and

sympathetic

a There are two variables mentioned here;

what is the explanatory variable?

b Tell whether the explanatory variable is

quantitative or categorical

c What is the response variable?

d Tell whether the response variable is

quantitative or categorical

e Would type of writings for each gender

be summarized with means or

proportions?

1.28 If researchers report that smokers are 10times as likely to be alcoholics compared tononsmokers, do they consider smoking to bethe explanatory variable or the response?1.29 The Centers for Disease Control andPrevention noted that “the price of a pack ofcigarettes went up 90% between 1997 and2003.”9Suppose students in an introductorystatistics course have been asked to identify thetwo variables of interest here, then tell which isexplanatory and which is response, andwhether each is quantitative or categorical.Which student has the correct answer?

Adam: The explanatory variable is price

of cigarettes, and it’s categorical because itwas summarized with a percentage Theresponse is year, and it’s quantitative because

it takes number values

Brittany: The roles are reversed: Year is

the quantitative explanatory variable andprice is the categorical response

Carlos: Year is the explanatory variable,

and because just two values are possible, it’scategorical Price is the response and it’squantitative—90% just tells how much theprice has changed from the year 1997 to theyear 2003

Dominique: Both variables are quantitative

because they both take number values; year isexplanatory because it affects the price.1.30 One-third of all nursing home patients withAlzheimer’s and other forms of dementia aregiven feeding tubes Researchers want toknow how unlikely it would be to find morethan half in a random sample of 100 suchpatients to have been given feeding tubes Arethe researchers mainly concerned with dataproduction, displaying and summarizingdata, probability, or performing statisticalinference?

1.31 Hand in an article or report about a

statistical study; tell what variable or

variables are involved and whether they are

quantitative or categorical If there are two

variables, tell which is explanatory andwhich is response If summaries arementioned, tell whether they are reportingmeans or proportions or something else

Discovering Research: Variable T ypes and Roles

1.32 Use the results of Exercise 1.6 and relevant

findings from the Internet to make a report

Reporting on Research: Variable T ypes and Roles

Typical graffiti for women’s room?

1.27 If researchers report that alcoholics are three

times as likely to smoke compared to

nonalcoholics, do they consider smoking to

be the explanatory variable or the response?

on divorce in the United States that relies onstatistical information

Trang 39

The process of data production consists of two steps: (1) obtain the

sam-ple, and (2) carry out a properly designed study to assess the variables orrelationships of interest In this chapter we will concentrate on the firststep, stressing that the sample must be taken in such a way as to ensurethat it represents the larger population of interest without bias

Pick a number at random from 1 to 20 This may sound easy, but unless youget outside help from something like a computer or a table of random digits or a20-sided die, the task is impossible Our brains are designed to recognize and cre-

ate patterns, not randomness.

Just as our brains are not equipped to guide us in selecting a number between

1 and 20 truly at random, we cannot pick a truly random sample of participantsfor a study “off the top of our head” without the aid of some random number gen-

erator Random as a household word often is used to describe a selection that a

statistician would call “haphazard.” Technically, a random sample must makeplanned use of chance so that the laws of probability apply

Sources of Bias in Sampling: When Selected Individuals Are Not Representative

Bias, the tendency for an estimate to deviate in one direction from the true value,

can enter into the selection process in a variety of ways After we define some ofthe most common sources of bias in sampling, we will examine how they can arise

in the context of an example

How good is the food?

Who should be asked?

How should we take a sample of individuals to

gain information about the larger group?

We defined a random

occurrence on page 9 to

be one that happens by

chance alone, and not

Trang 40

Definitions Selection bias occurs in general when the sample is

nonrepresentative of the larger population of interest

The sampling frame is the collection of all the individuals who have

the potential to be selected It should—but does not necessarily—match

the population of interest

A self-selected sample (also known as a volunteer sample) includes

only individuals who have taken the initiative to participate, as opposed

to having been recruited by researchers

A haphazard sample is selected without a scientific plan, according to

the whim of whoever is drawing the sample

The main criterion for selection in a convenience sample is that the

sampled individuals are found at a time or in a place that is handy for

researchers

Nonresponse occurs when individuals selected by researchers decline

to be part of the sample A sample is described as suffering from

nonresponse bias when too many individuals decline, to the extent that

there is a substantial impact on the composition of the sample

Call-in or Internet pollsare practically

guaranteed to bebiased, often quiteheavily, because theyresult in volunteersamples

A C LOSER

L OOK

E XAMPLE 2.1 How Various Types of Bias Occur in Sampling

Background:A professor wants to survey a sample of six from 80 class

members to get their opinion about the course textbook

Questions:Are these sampling methods unbiased? If not, what type of

bias enters in?

1 Ask for students to raise their hands if they would like to give their

opinion of the textbook

2 Sample the next six students who come in to office hours.

3 Look at a class roster and, without the aid of a random number

generator, attempt to take a “random” sample of six names

4 Assign each student in the classroom a number from 1 on up, then use

software or a table of random digits to select six at random

5 Take a random sample from the roster of students enrolled and mail

them a questionnaire

Responses:

1 Asking students to raise their hands yields a volunteer sample, which

would be likely to favor people with strong positive or negative

feelings about the book

2 Asking students who come in to office hours would yield a

convenience sample, and would result in bias because students who

need help may tend to find the book difficult to understand

Continued

Ngày đăng: 08/08/2018, 16:50

TỪ KHÓA LIÊN QUAN

w