1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Statistics for business decision making and analysis robert stine and foster chapter 26

42 111 1

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 42
Dung lượng 838 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

26.1 Comparing Several GroupsRelating the t-Test to Regression  Is there a significant difference between the average yield of Endurance and the others?. 26.1 Comparing Several GroupsCo

Trang 2

Analysis of Variance

Chapter 26

Trang 3

26.1 Comparing Several Groups

Did agricultural yield go up this year because

of more fertilizer or more rain? Or is it the result of temperature or type of seed used?

 Use regression analysis with dummy variables to compare the averages of several groups

 This approach is also known as analysis of

variance

Trang 4

26.1 Comparing Several Groups

Which Wheat Variety Should a Farmer Plant?

Endurance, Hatcher, NuHills, RonL, and Ripper.

and yield was measured as bushels per acre.

number of observations for each treatment.

Trang 5

26.1 Comparing Several Groups

Steps to Follow in the Analysis

 Plot the data to find patterns

 Propose a regression model for the data

 Check conditions associated with the model

 Test hypotheses and draw a conclusion

Trang 6

26.1 Comparing Several Groups

Comparing Groups in Plots –

Boxplots of Yield

Trang 7

26.1 Comparing Several Groups

Comparing Groups in Plots –

Summary Statistics

Trang 8

26.1 Comparing Several Groups

Relating the t-Test to Regression

 Is there a significant difference between the

average yield of Endurance and the others?

 Since the variances among groups appear

similar, use the two sample t-test and pool the

variances

Trang 9

26.1 Comparing Several Groups

Relating the t-Test to Regression

The t-statistic and p-value show that Endurance

has a significantly higher mean yield per acre

than the combination of other varieties

Trang 10

26.1 Comparing Several Groups

Relating the t-Test to Regression

 The t-test can be formulated as a regression with

a dummy variable D(Endurance) that is coded 1 if

plot is seeded with Endurance and 0 otherwise

Trang 11

26.1 Comparing Several Groups

Relating the t-Test to Regression

The slope b 1 = 5.53 matches the estimate for the difference between means

 Testing the slope is equivalent to a pooled

two-sample t-test of the difference between means

(the t-statistic and p-value are the same).

Trang 12

26.1 Comparing Several Groups

Comparing Several Groups Using Regression

 Define the following dummy variables:

D(Endurance) = 1 if plot grows Endurance, 0 otherwise.

D(Hatcher) = 1 if plot grows Hatcher, 0 otherwise.

D(NuHills) = 1 if plot grows NuHills, 0 otherwise.

D(Ripper) = 1 if plot grows Ripper, 0 otherwise.

J-1 dummy variables are needed to represent J

categories

Trang 13

26.1 Comparing Several Groups

Comparing Several Groups Using Regression

 The variety RonL is the baseline category

(defined by all zeros for the dummy variables)

 Analysis of variance (ANOVA): the comparison of two or more averages using regression model

with all dummy variables

Trang 14

26.1 Comparing Several Groups

Comparing Several Groups Using Regression

Trang 15

26.1 Comparing Several Groups

Interpreting the Estimates

 The slope of each dummy variable compares the

average response of its category to the average of the baseline category.

 If D(Endurance) = 1, we find = 19.58 bushels per acre Since b0 = 11.68 is the mean yield for RonL,

the slope for D(Endurance), which is b 1 = 7.9 is the

difference between the average yields

Trang 16

26.1 Comparing Several Groups

ANOVA Regression Model

The equation of the MRM for the Wheat example

can be written in terms of the population means:

) (

) (

) (

Trang 17

26.1 Comparing Several Groups

ANOVA Regression Model

One-Way Analysis of Variance This regression

model compares the averages of the groups

defined by J levels of a categorical variable The observations in each group are a sample from the associated population

Equation:

Assumptions: Errors are independent, have equal

variances and are normally distributed

ij j

ij

y    

Trang 18

26.2 Inference in ANOVA Regression

Models

Checking Conditions

 Linear association: automatic for ANOVA

 No obvious lurking variable: automatic if data are from a randomized experiment (i.e., wheat

example)

 Check the remaining conditions (independence,

similar variances, and normality) with appropriate residual plots

Trang 19

26.2 Inference in ANOVA Regression

Models

Checking Conditions

If IQR’s are similar, within a factor of 3 to 1 with up to five groups, similar variances condition is met.

Trang 20

26.2 Inference in ANOVA Regression

Models

Checking Conditions

Residuals appear nearly normal

Trang 21

26.2 Inference in ANOVA Regression

Models

F-Test for the Difference among Means

H0: µ1 = µ2 = µ3 = µ4 = µ5

Trang 22

26.2 Inference in ANOVA Regression

Models

Understanding the F-Test

Consider the following hypothetical means:

Are these averages statistically significantly different? To answer this question, need to know the variance within each group.

Trang 23

26.2 Inference in ANOVA Regression

Models

Understanding the F-Test

Both plots show groups with the same averages, but different

within group variances No significant differences in averages in right plot.

Trang 24

26.2 Inference in ANOVA Regression

Models

Confidence Intervals

Since the F-test shows that the mean yields among varieties of wheat are not the same, which

variety is best?

Trang 26

26.3 Multiple Comparisons

Tukey Confidence Intervals

 These intervals hold the chance for a Type I error to 5% over the entire collection of

pairwise comparisons.

Replaces the t-percentile in confidence intervals with a larger multiple of the standard error

(obtained from a special table).

Trang 27

26.3 Multiple Comparisons

Tukey Confidence Intervals - Wheat Example

 The 95% Tukey confidence interval for the two best varieties of wheat (Endurance and Hatcher): 2.04 ± 2.875 2.11 = 2.04 ± 6.07 bushels/acre

 This difference is not statistically significant since the Tukey interval includes 0.

Trang 28

26.3 Multiple Comparisons

Tukey Confidence Intervals - Wheat Example

 Note that the width of the 95% Tukey confidence interval is the same for any pairwise comparison

 The difference in yield between any two varieties compared must be more than 6.07 bushels/acre

in order to be statistically significant

Trang 29

26.3 Multiple Comparisons

Bonferroni Confidence Intervals

 These intervals adjust for multiple comparisons by changing the α level used in the standard interval

to α/M for M intervals.

Bonferroni confidence intervals reduce α = 0.05 to

α/10 = 0.005 and replaced t = 2.08 with t = 3.00.

Trang 30

26.4 Groups of Different Size

 With groups of different sizes, unbalanced data

produce confidence intervals of different widths

 Compute the estimated standard error for a

pairwise comparison using the following formula with relevant sample sizes:

2 1

2 1

1

1)

(

n n

s y

y

Trang 31

4M Example 26.1: JUDGING THE

CREDIBILITY OF ADVERTISEMENTS

Motivation

Advertising executives want to compare four

commercials for a retail item that make claims of varying strengths Specifically, they want to know how over-the-top an ad can be before customers turn away in disbelief

Trang 32

4M Example 26.1: JUDGING THE

CREDIBILITY OF ADVERTISEMENTS

Method

The data consist of reactions for a sample of 80

customers who viewed commercials with claims

in one of four categories: Tame, Plausible,

Stretch and Outrageous Each customer was

randomly assigned to a commercial The

response variable is Credibility obtained by

customers’ responses to items on a questionnaire they completed after viewing the ad

Trang 33

4M Example 26.1: JUDGING THE

CREDIBILITY OF ADVERTISEMENTS

Method

Use regression with three dummy variables to

capture the four types of claims made in the

commercials Check the conditions for ANOVA Linearity is not an issue and there are no obvious lurking variables because randomization was

used in designing the study

Trang 34

4M Example 26.1: JUDGING THE

CREDIBILITY OF ADVERTISEMENTS

Mechanics - Results

Trang 35

4M Example 26.1: JUDGING THE

CREDIBILITY OF ADVERTISEMENTS

Mechanics – Results

Trang 36

4M Example 26.1: JUDGING THE

Trang 37

4M Example 26.1: JUDGING THE

Trang 38

4M Example 26.1: JUDGING THE

CREDIBILITY OF ADVERTISEMENTS

Mechanics –

The F-test has a p-value 0.0251; reject H0 The

mean credibility of the four commercials is not

equal Performing pairwise comparisons using

Tukey intervals, the difference between average credibility must be more than 3.25 to be

statistically significant

Trang 39

4M Example 26.1: JUDGING THE

CREDIBILITY OF ADVERTISEMENTS

Message

Based on the Tukey intervals, there is only one

statistically significant pairwise difference

(between commercials making tame claims and

those that make outrageous) Customers place

less credibility in ads that make outrageous

claims than ads that make tame claims

Trang 40

Best Practices

when using ANOVA regression

 Use Tukey or Bonferroni confidence intervals to

identify groups that are significantly different

 Recognize the cost of snooping in the data to

choose hypotheses

Trang 41

 Don’t compare the means of several groups using

lots of t-tests

 Don’t forget confounding factors

 Never pretend you have only two groups

Trang 42

Pitfalls (Continued)

 Do not add or subtract standard errors

 Do not use a one-way ANOVA to analyze data

with repeated measurements

Ngày đăng: 10/01/2018, 16:01

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN