BQ) Part 2 book Business statistics: A decision - making approach has contents: Estimating single population parameters, introduction to hypothesis testing, estimation and hypothesis testing for two population parameters, analysis of variance,...and other contents.
Trang 1Estimation and Hypothesis
Testing for Two Population
Parameters
From Chapter 10 of Business Statistics, A Decision-Making Approach, Ninth Edition David F Groebner,
Patrick W Shannon and Phillip C Fry Copyright © 2014 by Pearson Education, Inc All rights reserved.
Trang 2
Outcome 1 Discuss the logic behind and demonstrate the techniques for using independent samples to test hypotheses and develop interval estimates for the difference between two population means.
Outcome 3 Carry out hypothesis tests and establish interval estimates, using sample data, for the difference between two population proportions.
Outcome 2 Develop confidence interval estimates and conduct hypothesis tests for the difference between two population means for paired samples.
1 Estimation for Two Population
Means Using Independent Samples
2 Hypothesis Tests for Two
Population Means Using pendent Samples
Inde-3 Interval Estimation and
Hypothesis Tests for Paired Samples
4 Estimation and Hypothesis
Tests for Two Population Proportions
about single population means and single population proportions
Estimation and Hypothesis Testing for Two Population
Parameters
associated with sampling distributions for x and p.
t Review the steps for developing confidence
interval estimates for a single population mean and a single population proportion
t Review material on calculating and
interpreting sample means and standard
deviations
t Review the normal distribution.
Why you need to know
In many business decision-making situations, managers must decide between two or more alternatives For example, fleet managers in large companies must decide which model and make of car to purchase next year Airlines must decide whether to purchase replacement planes from Boeing or Airbus When deciding on a new advertising campaign, a company may need to evaluate proposals from competing adver- tising agencies Hiring decisions may require a personnel director to select one employee from a list of appli- cants Production managers are often confronted with decisions concerning whether to change a production process or leave it alone Each day, consumers purchase a product from among several competing brands Fortunately, there are statistical procedures that can help decision makers use sample information to com- pare different populations In this text, we introduce these procedures and techniques by discussing methods that can be used to make statistical comparisons between two populations Later, we will discuss some methods to extend this comparison to more than two populations Whether we are discussing cases involving two popula- tions or those with more than two populations, the techniques we present are all extensions of the statistical tools involving a single population parameter.
Tatiana Popova/Shutterstock
Trang 3
1 Estimation for Two Population
Means Using Independent Samples
In this section, we examine situations in which we are interested in the difference between two population means, looking first at the case in which the samples from the two populations are independent.
We will introduce techniques for estimating the difference between the means of two populations in the following situations:
1 The population standard deviations are known and the samples are independent
2 The population standard deviations are unknown and the samples are independent
Estimating the Difference between Two Population
Independent Samples
Recall that the standard normal distribution z-values were used in establishing the
criti-cal value and developing the interval estimate when the population standard deviation was assumed known and the population distribution is assumed to be normally distributed.1 The general format for a confidence interval estimate is shown in Equation 1
Independent Samples
Samples selected from two or more populations
in such a way that the occurrence of values in
one sample has no influence on the probability
of the occurrence of values in the other
sample(s).
Chapter Outcome 1
1 If the samples from the two populations are large 1n Ú 302 the normal distribution assumption is not required.
Confidence Interval, General Format
Point estimate{ 1Critical value21Standard error2 (1)
In business situations, you will often need to estimate the difference between two tion means For instance, you may wish to estimate the difference in mean starting salaries between males and females, the difference in mean production output in union and nonunion factories, or the difference in mean service times at two different fast-food businesses In these situations, the best point estimate for m1- m2 is
1
n n2 oopulations 1 and 2
Further, the critical value for determining the confidence interval will be a z-value from
the standard normal distribution In these circumstances, the confidence interval estimate for
m1- m2 is found by using Equation 3
Trang 4
The z-values for several of the most commonly used confidence levels are
Confidence Level Critical z-value
EXAMPLE 1 CONFIDENCE INTERVAL ESTIMATE FOR M1 − M2 WHEN S1
AND S2 ARE KNOWN, USING INDEPENDENT SAMPLES
Axiom Fitness Axiom Fitness is a small chain of fitness centers located primarily in the South but with some clubs scattered in other parts of the U.S and Canada Recently, the club in Winston-Salem, North Carolina, worked with a business class from a local university
on a project in which a team of students observed Axiom customers with respect to their club usage As part of the study, the students measured the time that customers spent in the club during a visit The objective is to estimate the difference in mean time spent per visit for male and female customers Previous studies indicate that the standard deviation is 11 minutes for males and 16 minutes for females To develop a 95% confidence interval estimate for the difference in mean times, the following steps are taken:
samples from the two populations.
In this case, the company is interested in estimating the difference in mean time spent in the club between males and females The measure of interest is m1- m2 The student team has selected simple random samples of 100 males and 100 females at different times in the Winston-Salem club
The plan is develop a 95% confidence interval estimate
The resulting sample means are
Males: x1 = 34.5 minutes Females: x2 = 42.4 minutes The point estimate is
x1- x2 = 34.5 - 42.4 = - 7.9 minutes Women in the sample spent an average of 7.9 minutes longer in the club
The standard error is calculated as
121 2 2 2
11100
16
100 1 9416
The interval estimate will be developed using a 95% confidence interval Because the population standard deviations are known, the critical value is a
z-value from the standard normal table The critical value is
2
(3)
Trang 5
Step 6 Develop the confidence interval estimate using Equation 3.
Thus, based on the sample data and the specified confidence level, women spend
on average between 4.09 and 11.71 minutes longer at this Axiom Fitness Center
> END EXAMPLE
TRY PROBLEM 4
Estimating the Difference between Two Means When
When estimating a single population mean when the population standard deviation is
unknown, the critical value is a t-value from the t-distribution This is also the case when you
are interested in estimating the difference between two population means, if the following assumptions hold:
Chapter Outcome 1
Assumptions
r The populations are normally distributed
r The populations have equal variances
r The samples are independent
The following application illustrates how a confidence interval estimate is developed
using the t-distribution.
BUSINESS APPLICATION ESTIMATING THE DIFFERENCE BETWEEN TWO
POPULATION MEANS
RETIREMENT INVESTING A major political issue for the past decade
has focused on the long-term future of the U.S Social Security system Many people who have entered the workforce in the past 20 years believe the system will not be solvent when they retire, so they are actively investing in their own retirement accounts One investment alternative is a tax-sheltered annuity (TSA) marketed by life insurance companies Certain people, depending on occupation, qualify to invest part of their paychecks in a TSA and to pay no federal income tax on this money until it is withdrawn While the money is invested, the insurance companies invest it in either stock or bond portfolios A second alternative open to many people is a plan known
as a 401(k), in which employees contribute a portion of their paychecks to purchase stocks, bonds, or mutual funds In some cases, employers match all or part of the employee contributions In many 401(k) systems, the employees can control how their funds are invested
A recent study was conducted in North Dakota to estimate the difference in mean annual contributions for individuals covered by the two plans [TSA or 401(k)] A simple random sample of 15 people from the population of adults who are eligible for a TSA investment was selected A second sample of 15 people was selected from the population of adults in North Dakota who have 401(k) plans The variable of interest is the dollar amount of money invested in the retirement plan during the previous year Specifically, we are interested in estimating m1- m2 using a 95% confidence interval estimate where:
m1 = Mean dollars invested by the TSA - eligible population during the past year
m2 = Mean dollars invested by the 401 1 k 2- eligible population during the past year
Trang 6
TSA–Eligible 401(k)–Eligible
x1 = +2,119.70 x2 = +1,777.70
s1 = +709.70 s2 = +593.90
Before applying the t-distribution, we need to determine whether the assumptions are likely
to be satisfied First, the samples are considered independent because the amount invested by one group should have no influence on the likelihood that any specific amount will be found for the second sample
Next, Figure 1 shows the sample data and the box and whisker plots for the two samples These plots exhibit characteristics that are reasonably consistent with those associated with normal distributions and approximately equal variances Although using a box and whisker
plot to check the t-distribution assumptions may seem to be imprecise, studies have shown the
t-distribution to be applicable even when there are small violations of the assumptions This is
particularly the case when the sample sizes are approximately equal.2
Equation 4 can be used to develop the confidence interval estimate for the difference between two population means when you have small independent samples
Confidence Interval Estimate for M1−M2 When S1 and S2
Are Unknown, Independent Samples
Minimum First Quartile Median
d Quartile Maximum
951 1,572 2,318 2,641 3,253
334 1,465 1,773 2,234 2,676
3,122 3,253 2,021 2,479 2,318 1,407 2,641 1,648 2,439 1,059 2,799 1,714 951 2,372 1,572
1,781 2,594 1,615 334 2,322 2,234 2,022 1,603 1,395 1,604 2,676 1,773 1,156 2,092 1,465
401(k)
401(k) TSA
Trang 7To use Equation 4, we must compute the pooled standard deviation, s p If the
equal-variance assumption holds, then both s2 and s2 are estimators of the same population variance,
s2 To use only one of these, say s2, to estimate s2 would be disregarding the information
obtained from the other sample To use the average of s2 and s2, if the sample sizes were ferent, would ignore the fact that more information about s2 is obtained from the sample hav-
dif-ing the larger sample size We therefore use a weighted average of s2 and s2, denoted as s2, to estimate s2, where the weights are the degrees of freedom associated with each sample The
square root of s2 is known as the pooled standard deviation and is computed using
s p , we must first calculate s2 and s2 This requires that we estimate m1 and m2 using x1 and
x2, respectively The degrees of freedom are equal to the sample size minus the parameters estimated before the variance estimate is obtained Therefore, our degrees of freedom must
342 489 45
( )
Thus, the 95% confidence interval estimate for the difference in mean dollars for people who invest in a TSA versus those who invest in a 401(k) is
- +147.45 … 1 m1 - m22 … +831.45This confidence interval estimate crosses zero and therefore indicates there may be no difference between the mean contributions to TSA accounts and to 401(k) accounts by adults
in North Dakota The implication of this result is that the average amount invested by those individuals who invest in pretax TSA programs is no more or no less than that invested by those participating in after-tax 401(k) programs Based on this result, there may be an oppor-tunity to encourage the TSA investors to increase deposits
EXAMPLE 2 CONFIDENCE INTERVAL ESTIMATE FOR M1 − M2 WHEN S1
AND S2 ARE UNKNOWN, USING INDEPENDENT SAMPLES
Andreason Marketing, Inc. Andreason Marketing, Inc has been hired by a major newspaper in the U.S to estimate the dif-ference in mean time that newspaper subscribers spend reading the Saturday newspaper when subscribers age 50 and under are compared with those more than 50 years old A simple random sample of six people age 50 or younger and eight people over 50 participated in the study The estimate can be developed using the following steps:
Trang 8
Step 1 Define the population parameter of interest and select independent
samples from the two populations.
The objective here is to estimate the difference between the two age groups with respect to the mean time spent reading the Saturday edition of the news-paper The parameter of interest is m1 - m2
The marketing company has selected simple random samples of six
“younger” and eight “older” people Because the reading time by one son does not influence the reading time for any other person, the samples are independent
The marketing firm wishes to have a 95% confidence interval estimate
The resulting sample means and sample standard deviations for the two groups are
age … 50: x1 = 13.6 minutes age 7 50: x2 = 11.2 minutes
s1 = 3.1 minutes s2 = 5.0 minutes
n1 = 6 n2 = 8 The point estimate is
x1 x2 13 6 11 2 2 4 minutes
The pooled standard deviation is computed using
1
8 2 3277
Because the population standard deviations are unknown, the critical value
will be a t-value from the t-distribution as long as the population variances
are equal and the populations are assumed to be normally distributed The critical t for 95% confidence and 6 + 8 - 2 = 12 degrees of
4 5 0715
2 6715 1 2 7 4715
Trang 9
Because the interval crosses zero, we cannot conclude that a difference exists between the age groups with respect to the mean reading time for the Saturday edition Thus, with respect to this factor, it does not seem to matter whether the person is 50 or younger or over 50.
> END EXAMPLE
TRY PROBLEM 1
What If the Population Variances Are Not Equal? If you have reason to believe that the population variances are substantially different, Equation 4 is not appropriate for comput-ing the confidence interval Instead of computing the pooled standard deviation as part of the confidence interval formula, we use Equations 5 and 6
Confidence Interval for M1 − M2 When S1 and S2 Are Unknown and Not
Equal, Independent Samples
(x x ) t s
n
s n
2
1
222
(5)
where:
t is from the t-distribution with degrees of freedom computed using
Degrees of Freedom for Estimating Difference between Population Means When S1 and S2 Are Not Equal
df s n s n
s n n
s n n
2 2
(6)
EXAMPLE 3 ESTIMATINGM1 − M2 WHEN THE POPULATION VARIANCES
ARE NOT EQUAL
Citibank The marketing managers at Citibank are planning to roll out a new marketing campaign addressed at increasing bank card use As one part of the campaign, the company will be offering a low interest rate incentive to induce people to spend more money using its charge cards However, the company is concerned whether this plan will have a differ-ent impact on married card holders than on unmarried card holders So, prior to starting the marketing campaign nationwide, the company tests
it on a random sample of 30 unmarried and 25 married customers The managers wish to mate the difference in mean credit card spending for unmarried versus married for a two-week period immediately after being exposed to the marketing campaign Based on past data, the managers have reason to believe the spending distributions for unmarried and married will be approximately normally distributed, but they are unwilling to conclude the population vari-ances for spending are equal for the two populations
esti-A 95% confidence interval estimate for the difference in population means can be oped using the following steps:
The parameter of interest is the difference between the mean dollars spent on credit cards by unmarried versus married customers in the two-week period after being exposed to Citi’s new marketing program
The research manager wishes to have a 95% confidence interval estimate
Independent samples of 30 unmarried and 25 married customers were taken, and the credit card spending for each sampled customer during
Trang 10
the two-week period was recorded The following sample results were observed:
The standard error is calculated as
s n
s n
121 2 2 2
102 4030
77 25
25 24 25
Because we are unable to assume the population variances are equal, we must
first use Equation 6 to calculate the degrees of freedom for the t-distribution
This is done as follows:
df s n s n
s n n
s n n
( / ) ( / )
12 1 22 22
12 121
77 25 2524
Thus, the degrees of freedom (rounded down) will be 52 At the 95% confidence
level, using the t-distribution table, the approximate t-value is 2.0086 Note,
since there is no entry for 52 degrees of freedom in the table, we have selected
the t-value associated with 95% confidence and 50 degrees of freedom, which provides a slightly larger t-value than would have been the case for 52 degrees
of freedom Thus, the interval estimate will be generously wide
The confidence interval estimate is computed using
(x x ) t s
n
s n
2
1
222
Then the interval estimate is
$455 10 $268 90 2 0086 102 40. .
30
77 252
> END EXAMPLE
TRY PROBLEM 6
Trang 11
10-6 Two random samples were selected independently
from populations having normal distributions The following statistics were extracted from the samples:
x1 = 42.3 x2= 32.4
a If s1 = 3 and s2 = 2 and the sample sizes are
n1 = 50 and n2 = 50, construct a 95% confidence interval for the difference between the two population means
b If s1 = s2, s1 = 3, and s2 = 2, and the sample
sizes are n1 = 10 and n2 = 10, construct a 95% confidence interval for the difference between the two population means
c If s1 ≠ s2, s1 = 3, and s2 = 2, and the sample
sizes are n1 = 10 and n2 = 10, construct a 95% confidence interval for the difference between the two population means
10-7 Amax Industries operates two manufacturing facilities
that specialize in doing custom manufacturing work for the semiconductor industry The facility
in Denton, Texas, is highly automated, whereas the facility in Lincoln, Nebraska, has more manual functions For the past few months, both facilities have been working on a large order for a specialized product The vice president of operations is interested
in estimating the difference in mean time it takes to complete a part on the two lines To do this, he has requested that a random sample of 15 parts at each facility be tracked from start to finish and the time required be recorded The following sample data were recorded:
Denton, Texas Lincoln, Nebraska
x1= 56.7 hours x2 = 70.4 hours
s1= 7.1 hours s1 = 8.3 hours
Assuming that the populations are normally distributed with equal population variances, construct and interpret
a 95% confidence interval estimate
10-8 A credit card company operates two customer service
centers: one in Boise and one in Richmond Callers
to the service centers dial a single number, and a computer program routs callers to the center having the fewest calls waiting As part of a customer service review program, the credit card center would like to determine whether the average length of a call (not including hold time) is different for the two centers The managers of the customer service centers are willing to assume that the populations of interest are normally distributed with equal variances Suppose
10-1 The following information is based on independent
random samples taken from two normally distributed
populations having equal variances:
n1 = 15 n2= 13
x1 = 50 x2= 53
s1 = 5 s2= 6
Based on the sample information, determine the 90%
confidence interval estimate for the difference between
the two population means
10-2 The following information is based on independent
random samples taken from two normally distributed
populations having equal variances:
n1 = 24 n2= 28
x1 = 130 x2= 125
s1= 19 s2 = 17.5
Based on the sample information, determine the 95%
confidence interval estimate for the difference between
the two population means
10-3 Construct a 90% confidence interval estimate for the
difference between two population means given the
following sample data selected from two normally
distributed populations with equal variances:
Sample 1 Sample 2
10-4 Construct a 95% confidence interval estimate for the
difference between two population means based on the
10-5 Construct a 95% percent confidence interval for
the difference between two population means using
the following sample data that have been selected
from normally distributed populations with different
Trang 12b Suppose the manufacturers of each of these batteries wished to warranty their batteries One small company to which they both ship batteries receives shipments of 200 batteries weekly If the average length of time to failure of the batteries is less than a specified number, the manufacturer will refund the company’s purchase price of that set of batteries What value should each manufacturer set if they wish
to refund money on at most 5% of the shipments?
10-11 Wilson Construction and Concrete Company is known
as a very progressive company that is willing to try new ideas to improve its products and service One
of the key factors of importance in concrete work
is the time it takes for the concrete to “set up.” The company is considering a new additive that can be put in the concrete mix to help reduce the setup time Before going ahead with the additive, the company plans to test it against the current additive To do this,
14 batches of concrete are mixed using each of the additives The following results were observed:
Old Additive New Additive
x = 17.2 hours x = 15.9 hours
s = 2.5 hours s = 1.8 hours
a Use these sample data to construct a 90%
confidence interval estimate for the difference in mean setup time for the two concrete additives On the basis of the confidence interval produced, do you agree that the new additive helps reduce the setup time for cement? (Assume the populations are normally distributed.) Explain your answer
b Assuming that the new additive is slightly more expensive than the old additive, do the data support switching to the new additive if the managers of the company are primarily interested in reducing average setup time?
10-12 A working paper (Mark Aguiar and Erik Hurst,
“Measuring Trends in Leisure: The Allocation of Time over Five Decades,” 2006) for the Federal Reserve Bank of Boston concluded that average leisure time spent per week by women in 2003 was 33.80 hours and 37.56 hours for men The sample standard deviations were 40 and 70, respectively These results were obtained from samples of women and men of size 8,492
and 6,752, respectively In this study, leisure refers to
the time individuals spent socializing, in passive leisure,
in active leisure, volunteering, in pet care, gardening, and recreational child care Assume that the amount
of leisure time spent by men and women have normal distributions with equal population variances
a Determine the pooled estimate of the common populations’ standard deviation
b Produce the margin of error to estimate the difference of the two population means with a confidence level of 95%
a random sample of phone calls to the two centers is
selected and the following results are reported:
Boise Richmond
Sample Mean (seconds) 195 216
Sample St Dev (seconds) 35.10 37.80
a Using the sample results, develop a 90% confidence
interval estimate for the difference between the two
population means
b Based on the confidence interval constructed in part
a, what can be said about the difference between the
average call times at the two centers?
10-9 A pet food producer manufactures and then fills
25-pound bags of dog food on two different production
lines located in separate cities In an effort to determine
whether differences exist between the average fill rates
for the two lines, a random sample of 19 bags from line
1 and a random sample of 23 bags from line 2 were
recently selected Each bag’s weight was measured and
the following summary measures from the samples were
reported:
Production Line 1 Production Line 2
Sample Mean, x 24.96 25.01
Sample Standard Deviation, s 0.07 0.08
Management believes that the fill rates of the two lines
are normally distributed with equal variances
a Calculate the point estimate for the difference
between the population means of the two lines
b Develop a 95% confidence interval estimate of the
true mean difference between the two lines
c Based on the 95% confidence interval estimate
calculated in part b, what can the managers of the
production lines conclude about the differences
between the average fill rates for the two lines?
10-10 Two companies that manufacture batteries for electronics
products have submitted their products to an independent
testing agency The agency tested 200 of each company’s
batteries and recorded the length of time the batteries
lasted before failure The following results were
determined:
Company A Company B
x = 41.5 hours x = 39.0 hours
a Based on these data, determine the 95% confidence
interval to estimate the difference in average life of
the batteries for the two companies Do these data
indicate that one company’s batteries will outlast the
other company’s batteries on average? Explain
Trang 13
c Calculate a 95% confidence interval for the
difference in the average leisure time between
women and men
d Do your results in part c indicate that the average
amount of men’s leisure time was larger than that of
women in 2003? Support your assertions
e Would your conclusion in part d change if you did
not assume the population variances were equal?
10-13 The Graduate Management Admission Council
reported a shift in the job-hunting strategies among
second-year masters of business administration (MBA)
candidates Even though their prospective base salary
has increased from $81,900 to $93,770 from 2002 to
2005, it appears that MBA candidates are submitting
fewer job applications Data obtained from online
surveys of 1,442 MBA candidates at 30 business
school programs indicate that in 2002 the average
number of job applications per candidate was 38.9 and
2.0 in 2005 The sample variances were 64 and 0.32,
respectively
a Examine the sample variances Conjecture
whether this sample evidence indicates that the
two population variances are equal to each other
Support your assertion
b On the basis of your answer in part a, construct a
99% confidence interval for the difference in the
average number of job applications submitted by
MBA candidates between 2002 and 2005
c Using your result in part b, is it plausible that the
difference in the average number of job applications
submitted is 36.5? Is it plausible that the difference
in the average number of job applications submitted
is 37? Are your answers to these two questions
contradictory? Explain
10-14 Logston Enterprises operates a variety of businesses
in and around the St Paul, Minnesota, area
Recently, the company was notified by the law firm
representing several female employees that a lawsuit
was going to be filed claiming that males were given
preferential treatment when it came to pay raises by
the company The Logston human resources manager
has requested that an estimate be made of the
difference between mean percentage raises granted
to males versus females Sample data are contained
in the file Logston Enterprises She wants you to
develop and interpret a 95% confidence interval
estimate She further states that the distribution of
percentage raises can be assumed approximately
normal, and she expects the population variances to
be about equal
10-15 The owner of the A.J Fitness Center is interested in
estimating the difference in mean years that female members have been with the club compared with male members He wishes to develop a 95% confidence interval estimate Sample data are in the file called
AJ Fitness Assuming that the sample data are
approximately normal and that the two populations have equal variances, develop and interpret the confidence interval estimate Discuss the result
10-16 Platinum Billiards, Inc., based in Jacksonville, Florida,
is a retailer of billiard supplies It stands out among billiard suppliers because of the research it does to assure its products are top notch One experiment was conducted to measure the speed attained by a cue ball struck by various weighted pool cues The conjecture
is that a light cue generates faster speeds while breaking the balls at the beginning of a game of pool Anecdotal experience has indicated that a billiard cue weighing less than 19 ounces generates faster speeds Platinum used a robotic arm to investigate this claim The research generated the data given in the file titled
Breakcue.
a Calculate the sample standard deviation and mean speed produced by cues in the two weight categories: (1) under 19 ounces and (2) at or above
19 ounces
b Calculate a 95% confidence interval for the difference in the average speed of a cue ball generated by each of the weight categories
c Is the anecdotal evidence correct? Support your assertion
d What assumptions are required so that your results
in part b would be valid?
10-17 The Federal Reserve reported in its comprehensive
Survey of Consumer Finances, released every three years, that the average income of families in the United States declined from 2001 to 2004 This was the first decline since 1989–1992 A sample of incomes was taken in 2001 and repeated in 2004 After adjusting for inflation, the data that arise from these samples are given in a file titled Federal Reserve.
a Determine the percentage decline indicated by the two samples
b Using these samples, produce a 90% confidence interval for the difference in the average family income between 2001 and 2004
c Is it plausible that there has been no decline in the average income of U.S families? Support your assertion
d How large an error could you have made by using the difference in the sample means to estimate the difference in the population means?
END EXERCISES 10-1
Trang 14
2 Hypothesis Tests for Two Population
Means Using Independent Samples
You are going to encounter situations that will require you to test whether two populations have equal means or whether one population mean is larger (or smaller) than another These hypothesis-testing applications are just an extension of the hypothesis-testing process for
a single population mean They also build directly on the estimation process introduced in Section 1
In this section, we will introduce hypothesis-testing techniques for the difference between the means of two populations in the following situations:
1 The population standard deviations are known and the samples are independent
2 The population standard deviations are unknown and the samples are independent
The remainder of this section presents examples of hypothesis tests for these different situations
Using Independent Samples
Samples are considered to be independent when the samples from the two populations are
taken in such a way that the occurrence of values in one sample has no influence on the ability of occurrence of the values in the second sample In special cases in which the popu-lation standard deviations are known and the samples are independent, the test statistic is a
prob-z-value computed using Equation 7.
1m1- m22 = Hypothesized difference in population means
If the calculated z-value using Equation 7 exceeds the critical z-value from the standard
nor-mal distribution, the null hypothesis is rejected Example 4 illustrates the use of this test statistic
EXAMPLE 4 HYPOTHESIS TEST FOR M1 − M2 WHEN S1 AND S2 ARE
KNOWN, INDEPENDENT SAMPLES
Brooklyn Brick, Inc. Brooklyn Brick, Inc is a Pennsylvania-based company that makes bricks and concrete blocks for the building industry One product is a brick facing material that looks like a real brick but is much thinner The ideal thickness is 0.50 inches The bricks that the company makes must be very uniform in their dimension so brickmasons can build straight walls The company has two plants that produce brick facing products, and the tech-nology used at the two plants is slightly different At plant 1, the standard deviation in the thickness of brick facing products is known to be 0.025 inches, and the standard deviation at plant 2 is 0.034 inches These are known values However, the company is interested in deter-mining whether there is a difference in the average thickness of brick facing products made at the two plants Specifically, the company wishes to know whether plant 2 also provides brick facing products that have a greater mean thickness than the products produced at plant 1 If the test determines that plant 2 does provided thicker materials than plant 1, the managers will
Trang 15
have the maintenance department attempt to adjust the process to reduce the mean thickness
To test this, you can use the following steps:
This is m1 - m2, the difference in the two population means
We are interested in determining whether the mean thickness for plant 2 exceeds that for plant 1 The following null and alternative hypotheses are specified:
H0: m1 - m2 Ú 0.0 H0: m1 Ú m2
or
H A: m1 - m2 6 0.0 H A: m1 6 m2
The test will be conducted using a = 0.05
Because the population standard deviations are assumed to be known, the
critical value is a z-value from the standard normal distribution This test
is a one-tailed lower-tail test, with a = 0.05 From the standard normal
distribution, the critical z-value is
- z0.05 = - 1.645The decision rule compares the test statistic found in Step 5 to the critical
z-value.
If z 6 -1.645, reject the null hypothesis;
Otherwise, do not reject the null hypothesis
Alternatively, you can state the decision rule in terms of a p-value, as follows:
If p@value 6 a = 0.05, reject the null hypothesis;
Otherwise, do not reject the null hypothesis
Select simple random samples of brick facing pieces from the two populations and compute the sample means A simple random sample of 100 brick facing pieces is selected from plant 1’s production, and another simple random sample
of 100 brick facing pieces is selected from plant 2’s production The samples are independent because the thicknesses of the brick pieces made by one plant can in no way influence the thicknesses of the bricks made by the other plant The means computed from the samples are
x1 0 501 inches and x2 0 509 inchesThe test statistic is obtained using Equation 7
z x x
n n z
The critical -z0.05 = - 1.645, and the test statistic value was computed to be
z = -1.90 Applying the decision rule,
Because z = -1.90 6 -1.645, reject the null hypothesis.
Figure 2 illustrates this hypothesis test
How to do it (Example 4)
The Hypothesis-Testing Process
for Two Population Means
The hypothesis-testing process
for tests involving two population
means introduced in this section is
essentially the same as for a single
population mean The process is
composed of the following steps:
1.Specify the population
param-eter of interest.
2.Formulate the appropriate null
and alternative hypotheses The
null hypothesis should contain
the equality Possible formats
for hypotheses testing
concern-ing two populations means are
where c = any specified number.
3.Specify the significance level
1a2 for testing the hypothesis
Alpha is the maximum
allow-able probability of committing a
Type I statistical error.
4.Determine the rejection region
and develop the decision rule.
5.Compute the test statistic or the
p-value Of course, you must
first select simple random
sam-ples from each population and
compute the sample means.
6.Reach a decision Apply the
decision rule to determine
whether to reject the null
hypothesis.
7.Draw a conclusion.
Trang 16
Test Statistic:
Decision Rule:
Since z = –1.90 < z = –1.645, reject H0 Conclude that the brick facings made by plant 2 have a larger mean thickness than those made by plant 1.
= –1.90
(x1 – x2) – ( 1 – 2)
= (0.501– 0.509) – 00
n1
2
0.025 2 100
0.034 2 100
FIGURE 2 |
Example 4 Hypothesis Test
There is statistical evidence to conclude that the brick facings made by plant 2 have
a larger mean thickness than those made by plant 1 Thus, the managers of Brooklyn Brick, Inc need to take action to reduce the mean thicknesses from plant 2
> END EXAMPLE
TRY PROBLEM 21
Using p-Values The z-test statistic computed in Example 4 indicates that the
differ-ence in sample means is 1.90 standard errors below the hypothesized differdiffer-ence of zero
Because this falls below the z critical level of –1.645, the null hypothesis is rejected You could have also tested this hypothesis using the p-value approach The p-value for this one-tailed test is the probability of a z-value in a standard normal distribution being less than –1.90 From the standard normal table, the probability associated with z = -1.90 is 0.4713 Then the p-value is
p@value = 0.5000 - 0.4713 = 0.0287
The decision rule to use with p-values is
If p@value 6 a, reject the null hypothesis;
Otherwise, do not reject the null hypothesis
Because
p@value = 0.0287 6 a = 0.05reject the null hypothesis and conclude that the mean brick facing thickness for plant 2 is greater than the mean thickness for products produced by plant 1
Using Independent Samples
In Section 1 we showed that to develop a confidence interval estimate for the difference between
two population means when the standard deviations are unknown, we used the t-distribution to
obtain the critical value As you might suspect, this same approach is taken for hypothesis-testing
situations Equation 8 shows the t-test statistic that will be used when s1 and s2 are unknown
Trang 17
t-Test Statistic for M1 − M2 When S1 and S2 Are Unknown and Assumed
Equal, Independent Samples
t x x s
x1andx2 Sample means from populations 1 annd 2
Hypothesized difference between
and Sample sizes fro
n1 n2 mm the two populations
Pooled standard de
s p vviation (see Equation 4)The test statistic in Equation 8 is based on three assumptions:
Assumptions
r Each population has a normal distribution.3
r The two population variances, s2 and s2, are equal
r The samples are independent
Notice that in Equation 8, we are using the pooled estimate for the common population standard deviation that we developed in Section 1
BUSINESS APPLICATION HYPOTHESIS TEST FOR THE DIFFERENCE BETWEEN
TWO POPULATION MEANS
RETIREMENT INVESTING (CONTINUED) Recall the earlier
example discussing a study in North Dakota involving retirement investing The leaders of the study are interested in determining whether there is a difference in mean annual contributions for individuals covered
by TSAs and those with 401(k) retirement programs A simple random sample of 15 people from the population of adults who are eligible for
a TSA investment was selected A second sample of 15 people was selected from the population of adults in North Dakota who have 401(k) plans The variables of interest are the dollars invested in the two retirement plans during the previous year
Specifically, we are interested in testing the following null and alternative hypotheses:
H0: m1 - m2 = 0.0 H0: m1 = m2
or
H A: m1 - m2 ≠ 0.0 H A: m1 ≠ m2
m1 = Mean dollars invested by the TSA - eligible population during the past year
m2 = Mean dollars invested by the 401 1 k 2 - eligible population during the past yearThe leaders of the study select a significance level of a = 0.05 The sample results are
Trang 18
We are now in a position to complete the hypothesis test to determine whether the mean dollar amount invested by TSA employees is different from the mean amount invested by 401(k) employees We first determine the critical values with degrees of freedom equal to
n1 + n2 - 2 = 15 + 15 - 2 = 28and a = 0.05 for the two-tailed test.4 The appropriate t-values are
1 4313)
The difference in sample means is attributed to sampling error Figure 3 summarizes this hypothesis test Based on the sample data, there is no statistical justification to believe that the mean annual investment by individuals eligible for the TSA option is different from those individuals eligible for the 401(k) plan
4 You can also use Excel’s T.INV.2T function 1=T.INV.2T10.05,282.
FIGURE 3 |
Hypothesis Test for the
Equality of the Two Population
Means for the North Dakota
–t0.025 = –2.0484 t0.025 = 2.0484
df = n1+ n2 – 2 = 15 + 15 – 2 = 28
t = 1.4313
+ 1
n1
1
1 15
1 15
s p
x1– x2 = (2,119.70 – 1,777.70) = 342.00
Rejection Region /2 = 0.025
Trang 19
BUSINESS APPLICATION USING EXCEL TO TEST FOR THE DIFFERENCE
BETWEEN TWO POPULATION MEANS
SUV VEHICLE MILEAGE Excel has a procedure for
performing the necessary calculations to test hypotheses involving two population means Consider a national car rental company that is interested in testing to determine whether there is a difference in mean mileage for sport utility vehicles (SUVs) driven in town versus those driven
on the highway Based on its experience with regular automobiles, the company believes the mean highway mileage will exceed the mean city mileage
To test this belief, the company has randomly selected 25 SUV rentals driven only on the highway and another random sample of 25 SUV rentals driven only in the city The vehicles were filled with 14 gallons of gasoline The company then asked each customer to drive the car until
it ran out of gasoline At that point, the elapsed miles were noted and the miles per gallon (mpg) were recorded For their trouble, the customers received free use of the SUV and a coupon valid for one week’s free rental The results of the experiment are contained in the file Mileage.
Excel can be used to perform the calculations required to determine whether the manager’s belief about SUV highway mileage is justified We first formulate the null and alternative hypotheses to be tested:
H0: m1 - m2 … 0.0 H0: m1 … m2
or
H A: m1 - m2 7 0.0 H A: m1 7 m2
Population 1 represents highway mileage, and population 2 represents city mileage The test
is conducted using a significance level of 0.05 = a
Figure 4 shows the descriptive statistics for the two independent samples
FIGURE 4 |
Excel 2010 Output—SUV
Mileage Descriptive Statistics
Excel 2010 Instructions:
1 Open file: Mileage.xlsx.
2 Select Data . Data
Analysis.
3 Select Descriptive
Statistics.
4 Define the data range
for all variables to be
Trang 20Figure 5 displays the Excel box and whisker plots for the two samples Based on these plots, the normal distribution and equal variance assumptions appear reasonable We will pro-ceed with the test of means assuming normal distributions and equal variances.
Figure 6 shows the Excel output for the hypothesis test The mean highway mileage is 19.6468 mpg, whereas the mean for city driving is 16.146 At issue is whether this differ-ence in sample means119.6468 - 16.146 = 3.5008 mpg2is sufficient to conclude the mean
highway mileage exceeds the mean city mileage The one-tail t critical value for a = 0.05 is shown in Figure 6 to be
t0.05 = 1.6772
Figure 6 shows that the “t-Stat” value from Excel, which is the calculated test statistic (or
t-value, based on Equation 8), is equal to
t = 2.52The difference in sample means (3.5008 mpg) is 2.52 standard errors larger than the hypoth-esized difference of zero Because the test statistic
t = 2.52 7 t0.05 = 1.6772
we reject the null hypothesis Thus, the sample data do provide sufficient evidence to conclude that mean SUV highway mileage exceeds mean SUV city mileage, and this study confirms the expectations of the rental company managers This will factor into the company’s fuel pricing
The output shown in Figures 6 also provides the p-value for the one-tailed test, which can also be used to test the null hypothesis Recall, if the calculated p-value is less than alpha, the
null hypothesis should be rejected The decision rule is
If p @value 6 0.05, reject H0
Otherwise, do not reject H0
The p-value for the one-tailed test is 0.0075 Because 0.0075 6 0.05, the null hypothesis is
rejected This is the same conclusion as the one we reached using the test statistic approach
FIGURE 5 |
Excel 2010 Output (PHStat
Add-in) Box and Whisker
Plot—SUV Mileage Test
Minitab Instructions (for similar results):
1 Open file: Mileage.MTW.
2 Choose Graph > Boxplot.
3 Under Multiple Ys, select Simple.
Trang 21EXAMPLE 5 HYPOTHESIS TEST FOR M1 − M2 WHEN S1 AND S2 ARE
UNKNOWN, USING INDEPENDENT SAMPLES
Color Printer Ink Cartridges A recent Associated Press news story out of Brussels, Belgium, indicated the European Union was considering a probe of computer makers after consumers com-plained that they were being overcharged for ink cartridges Companies such as Canon, Hewlett-Packard, and Epson are the printer market leaders and make most of their printer-related profits
by selling replacement ink cartridges Suppose an independent test agency wishes to conduct a test to determine whether name-brand ink cartridges generate more color pages on average than competing generic ink cartridges The test can be conducted using the following steps:
We are interested in determining whether the mean number of pages printed
by name-brand cartridges (population 1) exceeds the mean pages printed by generic cartridges (population 2)
The following null and alternative hypotheses are specified:
H0: m1 - m2 … 0.0 H0: m1 … m2
or
H A: m1 - m2 7 0.0 H A: m1 7 m2
Minitab Instructions (for similar results):
1 Open file: Mileage.MTW.
2 Choose Stat > Basic Statistics >
2-Sample t.
3 Choose Samples in different columns.
4 In First, enter the first data column.
5 In Second, enter the other data column.
6 Check Assume equal variances.
7 Click Options and enter 1 –
1 Open file: Mileage.xlsx.
2 Select Data . Data
Analysis.
3 Select t-test: Two
Sample Assuming
Equal Variances.
4 Define data ranges for
the two variables of
9 Click the Home tab and
adjust decimal points
in output
a
FIGURE 6 |
Excel 2010 Output for the
SUV Mileage t-Test for Two
Population Means
Trang 22
Step 3 Specify the significance level for the test.
The test will be conducted using a = 0.05
When the populations have standard deviations that are unknown, the
critical value is a t-value from the t-distribution if the populations are assumed
to be normally distributed and the population variances are assumed to be equal
A simple random sample of 10 users was selected, and the users were given a name-brand cartridge A second sample of 8 users was given generic cartridges Both groups used their printers until the ink ran out The number of pages printed was recorded The samples are independent because the pages printed by users in one group did not in any way influence the pages printed by users in the second group The means computed from the samples are
x1 = 322.5 pages and x2 = 298.3 pagesBecause we do not know the population standard deviations, these values are computed from the sample data and are
s1 = 48.3 pages and s2 = 53.3 pagesSuppose previous studies have shown that the number of pages printed by both types of cartridge tends to be approximately normal with equal variances
Based on a one-tailed test with a = 0.05, the critical value is a t-value from the t-distribution with 10 + 8 - 2 = 16 degrees of freedom From the t-table, the critical t-value is
t0.05 = 1.7459 = Critical value
The calculated test statistic from step 5 is compared to the critical t-value to
form the decision rule The decision rule is
If t 7 1.7459, reject the null hypothesis;
Otherwise, do not reject the null hypothesis
t x x s
18
1 0093
Because
t = 1.0093 6 t0.05 = 1.7459
do not reject the null hypothesis
Figure 7 illustrates the hypothesis test
Based on these sample data, there is insufficient evidence to conclude that the mean number of pages produced by name-brand ink cartridges exceeds the mean for generic cartridges
> > END EXAMPLE
TRY PROBLEM 20
Trang 23
What If the Population Variances Are Not Equal? In the previous examples, we assumed that the population variances were equal, and we carried out the hypothesis test for two population means using Equation 8 Even in cases in which the population variances
are not equal, the t-test as specified in Equation 8 is generally considered to be appropriate
as long as the sample sizes are equal.5 However, if the sample sizes are not equal and if the
sample data lead us to suspect that the variances are not equal, the t-test statistic must be
approximated using Equation 9.6 In cases in which the variances are not equal, the degrees of freedom are computed using Equation 10
1 8 50.55
+ 1
5Studies show that when the sample sizes are equal or almost equal, the t distribution is appropriate even when one
population variance is twice the size of the other.
t-Test Statistic for M1 − M2 When Population Variances Are
Unknown and Not Assumed Equal
t x x
s n
s n
( 1 2) ( 1 2)
12 22
(9)
Degrees of Freedom for t-Test Statistic When Population
Variances Are Not Equal
df s n s n
s n n
s n n
( / / )( / ) ( / )
121
22
(10)
Trang 24
My Stat Lab
null hypothesis based on the sample information Use the test statistic approach
10-21 Given the following null and alternative hypotheses,
conduct a hypothesis test using an alpha equal to 0.05
(Note: The population standard deviations are assumed
10-22 The following statistics were obtained from independent
samples from populations that have normal distributions:
b Determine the p-value for the test described in
10-24 Consider the following two independently chosen
samples whose population variances are not equal to each other
Sample 1 12.1 13.4 11.7 10.7 14.0
Sample 2 10.5 9.5 8.2 7.8 11.1
10-18 A decision maker wishes to test the following null and
alternative hypotheses using an alpha level equal to 0.05:
H0: m1 - m2 = 0
H A: m1 - m2 ≠ 0 The population standard deviations are assumed to
be known After collecting the sample data, the test
statistic is computed to be
z = 1.78
a Using the test statistic approach, what conclusion
should be reached about the null hypothesis?
b Using the p-value approach, what decision should
be reached about the null hypothesis?
c Will the two approaches (test statistic and p-value)
ever provide different conclusions based on the
same sample data? Explain
10-19 The following null and alternative hypotheses have
been stated:
H0: m1 - m2 = 0
H A: m1 - m2 ≠ 0
To test the null hypothesis, random samples have been
selected from the two normally distributed populations
with equal variances The following sample data were
a Assuming that the populations are normally
distributed with equal variances, test at the 0.10
level of significance whether you would reject the
null hypothesis based on the sample information
Use the test statistic approach
b Assuming that the populations are normally
distributed with equal variances, test at the 0.05
level of significance whether you would reject the
Trang 25
central part of the store compared with stores that have the dairy section at the rear of the store To consider relocating the dairy products, the manager feels that the increase in the mean amount spent by customers must be at least 25 cents To determine whether relocation is justified, her staff selected a random sample of 25 customers at stores in which the dairy section is central in the store A second sample of 25 customers was selected in stores with the dairy section
at the rear of the store The following sample results were observed:
Central Dairy Rear Dairy
x1= +3.74 x2= +3.26
s1 = +0.87 s2= +0.79
a Conduct a hypothesis test with a significance level
of 0.05 to determine if the manager should relocate the dairy products in those stores displaying their dairy products in the rear of the store
b If a statistical error associated with hypothesis testing was made in this hypothesis test, what error could it have been? Explain
10-28 Sherwin-Williams is a major paint manufacturer
Recently, the research and development (R&D) department came out with a new paint product designed to be used in areas that are particularly prone to periodic moisture and hot sun They believe that this new paint will be superior to anything that Sherwin-Williams or its competitors currently offer However, they are concerned about the coverage area that a gallon of the new paint will provide compared
to their current products The R&D department set
up a test in which two random samples of paint were selected The first sample consisted of 25 one-gallon containers of the company’s best-selling paint, and the second sample consisted of 15 one-gallon containers
of the new paint under consideration The following statistics were computed from each sample and refer
to the number of square feet that each gallon will cover:
Best-Selling Paint New Paint Product
on a significance level equal to 0.01?
10-29 Albertsons was once one of the largest grocery chains
in the United States, with more than 1,100 grocery stores, but in the early 2000s, the company began
to feel the competitive pinch from companies like Wal-Mart and Costco In January 2006, the company announced that it would be sold to SuperValu, Inc.,
a Using a significance level of 0.025, test the null
hypothesis that m1 - m2 … 0
b Calculate the p-value.
10-25 Descent, Inc., produces a variety of climbing and
mountaineering equipment One of its products is a
traditional three-strand climbing rope An important
characteristic of any climbing rope is its tensile
strength Descent produces the three-strand rope on
two separate production lines: one in Bozeman and
the other in Challis The Bozeman line has recently
installed new production equipment Descent regularly
tests the tensile strength of its ropes by randomly
selecting ropes from production and subjecting them to
various tests The most recent random sample of ropes,
taken after the new equipment was installed at the
Bozeman plant, revealed the following:
Bozeman Challis
x1 = 7,200 lb x2 = 7,087 lb
s1 = 425 s2 = 415
n1 = 25 n2 = 20
Descent’s production managers are willing to
assume that the population of tensile strengths for
each plant is approximately normally distributed
with equal variances Based on the sample results,
can Descent’s managers conclude that there is a
difference between the mean tensile strengths of
ropes produced in Bozeman and Challis? Conduct
the appropriate hypothesis test at the 0.05 level of
significance
10-26 The management of the Seaside Golf Club regularly
monitors the golfers on its course for speed of play
Suppose a random sample of golfers was taken in
2005 and another random sample of golfers was
selected in 2006 The results of the two samples are as
Based on the sample results, can the management
of the Seaside Golf Club conclude that average speed
of play was different in 2006 than in 2005? Conduct
the appropriate hypothesis test at the 0.10 level of
significance Assume that the management of the club
is willing to accept the assumption that the populations
of playing times for each year are approximately
normally distributed with equal variances
10-27 The marketing manager for a major retail grocery
chain is wondering about the location of the stores’
dairy products She believes that the mean amount
spent by customers on dairy products per visit is
higher in stores in which the dairy section is in the
Trang 26
college average was $11,800 Suppose the respective standard deviations were $2,050 and $2,084 The sample sizes were 75 and 205, respectively.
a Examine the sample standard deviations What
do these suggest is the relationship between the two population standard deviations? Support your assertion
b Conduct a hypothesis test to determine if the average college debt for bachelor of arts degree recipients is at least $25,000 more for graduates from private colleges than from public colleges Use
a 0.01 significance level and a p-value approach for
this hypothesis test
10-32 Suppose a professional job-placement firm that
monitors salaries in professional fields is interested
in determining if the fluctuating price of oil and the outsourcing of computer-related jobs have had
an effect on the starting salaries of chemical and electrical engineering graduates Specifically, the job-placement firm would like to know if the 2007 average starting salary for chemical engineering majors is higher than the 2007 average starting salary for electrical engineering majors To conduct its test, the job-placement firm has selected a random sample of 124 electrical engineering majors and 110 chemical engineering majors who graduated and received jobs in 2007 Each graduate was asked
to report his or her starting salary The results of the survey are contained in the file Starting Salaries.
a Conduct a hypothesis test to determine whether the mean starting salary for 2007 graduates in chemical engineering is higher than the mean starting salary for 2007 graduates in electrical engineering Conduct the test at the 0.05 level of significance
Be sure to state a conclusion (Assume that the firm believes the two populations from which the samples were taken are approximately normally distributed with equal variances.)
b Suppose the job-placement firm is unwilling to assume that the two populations are normally distributed with equal variances Conduct the appropriate hypothesis test to determine whether a difference exists between the mean starting salaries for the two groups Use a level of significance of 0.05 What conclusion should the job-placement firm reach based on the hypothesis test?
10-33 A USA Today editorial addressed the growth of
compensation for corporate CEOs Quoting a study
made by BusinessWeek, USA Today indicated that
the pay packages for CEOs have increased almost sevenfold on average from 1994 to 2004 The file titled
CEODough contains the salaries of CEOs in 1994 and
in 2004, adjusted for inflation
a Determine the ratio of the average salary for 1994
and 2004 Does it appear that BusinessWeek was
correct? Explain your answer
headquartered in Minneapolis Prior to the sale,
Albertsons had attempted to lower prices to regain its
competitive edge In an effort to maintain its profit
margins, Albertsons took several steps to lower costs
One was to replace some of the traditional checkout
stands with automated self-checkout facilities After
making the change in some test stores, the company
performed a study to determine whether the average
purchase through a self-checkout facility was less than
the average purchase at the traditional checkout stand
To conduct the test, a random sample of 125 customer
transactions at the self-checkout was obtained, and
a second random sample of 125 transactions from
customers using the traditional checkout process was
obtained The following statistics were computed
from each sample:
Self-Checkout Traditional Checkout
x1 = +45.68 x2 = +78.49
s1 = +58.20 s2 = +62.45
Based on these sample data, what should be
concluded with respect to the average transaction
amount for the two checkout processes? Test using an
a = 0.05 level
10-30 The Washington Post Weekly Edition quoted an Urban
Institute study that stated that about 80% of the
estimated $200 billion of federal housing subsidies
consisted of tax breaks (mainly deductions for
mortgage interest payments) Samples indicated that the
federal housing benefits average was $8,268 for those
with incomes between $200,000 and $500,000 and only
$365 for those with incomes of $40,000 to $50,000
The respective standard deviations were $2,100 and
$150 They were obtained from sample sizes of 150
a Examine the sample standard deviations What
do these suggest is the relationship between the
two population standard deviations? Support your
assertion
b Conduct a hypothesis test to determine if the
average federal housing benefits are at least $7,750
more for those in the $200,000 to $500,000 income
range Use a 0.05 significance level
c Having reached your decision in part b, state the
type of statistical error that you could have made
d Is there any way to determine whether you were in
error in the hypothesis selection you made in part b?
Support your answer
10-31 Although not all students have debt after graduating
from college, more than half do The College Board’s
2008 Trends in Student Aid addresses, among other
topics, the difference in the average college debt
accumulated by undergraduate bachelor of arts degree
recipients by type of college for the 2006–2007
academic year Samples might have been used to
determine this difference in which the private,
for-profit colleges’ average was $38,300 and the public
Trang 27
b Based on these sample data and a 0.05 level of significance, what conclusion should be made about the average length of stay at these two hotel chains?
10-35 Airlines were severely affected by the oil price
increases of 2008 Even Southwest lost money, the first time ever, during that time Many airlines began charging for services that had previously been free, such as baggage and meals One national airline had as
an objective of getting an additional $5 to $10 per trip from its customers Surveys could be used to determine the success of the company’s actions The file titled
AirRevenue contains results of samples gathered
before and after the company implemented its changes
a Produce a 95% confidence interval for the difference
in the average fares paid by passengers before and after the change in policy Based on the confidence interval, is it possible that revenue per passenger increased by at least $10? Explain your response
b Conduct a test of hypothesis to answer the question posed in part a Use a significance level of 0.025
c Did you reach the same conclusion in both parts a and b? Is this a coincidence or will it always be so? Explain your response
b Examine the sample standard deviations What do these
suggest is the relationship between the two population
standard deviations? Support your assertion
c Based on your response to part b, conduct a test
of hypothesis to determine if the difference in the
average CEO salary between 1994 and 2004 is more
than $9.8 million Use a p-value approach with a
significance level of 0.025
10-34 The Marriott Corporation operates the largest chain of
hotel and motel properties in the world The Fairfield
Inn and the Residence Inn are just two of the hotel
brands that Marriott owns At a recent managers’
meeting, a question was posed regarding whether
the average length of stay was different at these two
properties in the United States A summer intern was
assigned the task of testing to see if there is a difference
She started by selecting a simple random sample of 100
hotel reservations from Fairfield Inn Next, she selected
a simple random sample of 100 hotel reservations from
Residence Inn In both cases, she recorded the number
of nights’ stay for each reservation The resulting data
are in the file called Marriott.
a State the appropriate null and alternative hypotheses
END EXERCISES 10-2
Tests for Paired Samples
Sections 1 and 2 introduced the methods by which decision makers can estimate and test the hypotheses for the difference between the means for two populations when the two samples are independent In each example, the samples were independent because the sample values from one population did not have the potential to influence the probability that values would
be selected from the second population However, there are instances in business in which you would want to use paired samples to control for sources of variation that might otherwise
distort the conclusions of a study
Why Use Paired Samples?
There are many situations in business in which using paired samples should be considered For instance, a paint manufacturer might be interested in comparing the area that a new paint mix will cover per gallon with that of an existing paint mixture One approach would be to have one random sample of painters apply a gallon of the new paint mixture A second sample
of painters would be given the existing mix In both cases, the number of square feet that were covered by the gallon of paint would be recorded In this case, the samples would be inde-pendent because the area covered by painters using the new mixture would not be in any way affected by the area covered by painters using the existing mixture
This would be a fine way to do the study unless the painters themselves could influence the area that the paint will cover For instance, suppose some painters, because of their tech-nique or experience, are able to cover more area from a gallon of paint than other painters regardless of the type of paint used Then, if by chance most of these “good” painters happened
to get assigned to the new mix, the results might show that the new mix covers more area, not because it is a better paint, but because the painters that used it during the test were better
To combat this potential problem, the company might want to use paired samples To do this, one group of painters would be selected and each painter would use one gallon of each paint mix
We would measure the area covered by each painter for both paint mixes Doing this controls for the effect of the painters’ ability or technique The following application involving gasoline sup-plemented with ethanol testing is one in which paired samples would most likely be warranted
Paired Samples
Samples that are selected in such a way that
values in one sample are matched with the
values in the second sample for the purpose of
controlling for extraneous factors Another term
for paired samples is dependent samples.
Chapter Outcome 2
Trang 28
BUSINESS APPLICATION ESTIMATION USING PAIRED SAMPLES
TESTING ETHANOL MIXED GASOLINE A major oil company wanted to estimate the
difference in average mileage for cars using a regular gasoline compared with cars using a gasoline-and-ethanol mixture The company used a paired-sample approach to control for any variation in mileage arising because of different cars and drivers A random sample of
10 motorists (and their cars) was selected Each car was filled with regular gasoline The car was driven 200 miles on a specified route The car then was filled again with gasoline and the miles per gallon were computed After the 10 cars completed this process, the same steps were performed using the gasoline mixed with ethanol Because the same cars and drivers tested both types of fuel, the miles-per-gallon measurements for the ethanol mixture and regular gasoline will most likely be related The two samples are not independent but are
instead considered paired samples Thus, we will compute the paired difference between the
values from each sample, using Equation 11
Paired Difference
d = x1 - x2 (11)
where:
d = Paired difference
x1 and x2 = Values from samples 1 and 2, respectively
Point Estimate for the Population Mean Paired Difference, Md
d d n
i i
5 Number of paired differrences
Figure 8 shows the Excel spreadsheet for this mileage study with the paired differences puted The data are in the file called Ethanol-Gas.
com-The first step to develop the interval estimate is to compute the mean paired difference, d, using
Equation 12 This value is the best point estimate for the population mean paired difference, md
FIGURE 8 |
Excel 2010 Worksheet for
Ethanol Mixed Gasoline Study
Excel 2010 Instructions:
1 Open file: Ethanol-Gas.xlsx.
Using Equation 12, we determine d as follows:
d d n
22 7
10 2 27
The next step is to compute the sample standard deviation for the paired differences
using Equation 13
Sample Standard Deviation for Paired Differences
s
d d n
d
i i
d i = ith paired difference
d = Mean paired difference
Trang 29
The sample standard deviation for the paired differences is
Assuming that the population of paired differences is normally distributed, the confidence interval estimate for the population mean paired difference is computed using Equation 14
Confidence Interval Estimate for Population Mean Paired Difference, Md
d t s n
1d
d differenceSample standard deviation o
Number of paired dif
n fferences (sample size)
For a 95% confidence interval with 10 - 1 = 9 of freedom, we use a critical t from the
2 27 2 26224 38
10
2 27 3 13
0 86mpg ————— 5.40 mpgBecause the interval estimate contains zero, there may be no difference in the average mileage when either regular gasoline or the ethanol mixture is used
EXAMPLE 6 CONFIDENCE INTERVAL ESTIMATE FOR THE DIFFERENCE
BETWEEN POPULATION MEANS, PAIRED SAMPLES
PGA of America Testing Center Technology has done more to change golf than possibly any other group in recent years Titanium woods, hybrid irons, and new golf ball designs have impacted professional and amateur golfers alike PGA of America is the association that only professional golfers can belong to The association provides many services for golf pro-fessionals, including operating an equipment training center in Florida Recently, a maker of golf balls developed a new ball technology, and PGA of America
is interested in estimating the mean difference in driving distance for this new ball versus the existing best-seller To conduct the test, the PGA of America staff selected six professional golfers and had each golfer hit each ball one time Here are the steps necessary to develop a confidence interval estimate for the difference in population means for paired samples:
Because the same golfers hit each golf ball, the company is controlling for the variation in the golfers’ ability to hit a golf ball The samples are paired, and the population value of interest is md, the mean paired difference in distance
We assume that the population of paired differences is normally distributed
Trang 30
Step 2 Specify the desired confidence level and determine the appropriate
critical value.
The research director wishes to have a 95% confidence interval estimate
The sample data, paired differences, are shown as follows
Golfer Existing Ball New Ball d
13
6 ⫽ 2 17. yards
⫽
The standard deviation for the paired differences is computed using Equation 13
The critical t for 95% confidence and 6 - 1 = 5 degrees of freedom is
> END EXAMPLE
TRY PROBLEM 38
The key in deciding whether to use paired samples is to determine whether a factor exists that might adversely influence the results of the estimation In the ethanol mixed gasoline test example, we controlled for potential outside influence by using the same cars and drivers to test both gasolines In Example 6, we controlled for golfer ability by having the same golfers hit both golf balls If you determine there is no need to control for an outside source of varia-tion, then independent samples should be used, as discussed earlier
Hypothesis Testing for Paired Samples
As we just illustrated, there will be instances in which paired samples can be used to control for an outside source of variation For instance, in Example 5, involving the ink cartridges, the original test of whether name-brand cartridges yield a higher mean number of printed pages than generic cartridges involved different users for the two types of cartridges, so the samples
Chapter Outcome 2
Trang 31
EXAMPLE 7 HYPOTHESIS TEST FOR Md, PAIRED SAMPLES
Color Printer Ink Cartridges Referring to Example 5, suppose the experiment regarding ink cartridges is conducted differently Instead of having different samples
of people use name-brand and generic cartridges, the test is done using paired ples This means that the same people will use both types of cartridges, and the pages printed in each case will be recorded The test under this paired-sample scenario can
sam-be conducted using the following steps Six randomly selected people have agreed to participate
In this case, we will form paired differences by subtracting the generic pages from the name-brand pages We are interested in determining whether name-brand cartridges produce more printed pages, on average, than generic cartridges, so we would expect the paired difference to be positive We assume that the paired differences are normally distributed
The null and alternative hypotheses are
H0: md … 0.0
H A: md 7 0.0
The test will be conducted using a = 0.01
The critical value is a t-value from the t-distribution, with a = 0.01 and
6 - 1 = 5 degrees of freedom The critical value is
t0.01 = 3.3649The decision rule is
If t 7 3.3649, reject the null hypothesis;
otherwise, do not reject the null hypothesis
Select the random sample and compute the mean and standard deviation for the paired differences
were independent However, different users may use more or less ink as a rule; therefore, we could control for that source of variation by having a sample of people use both types of car-tridges in a paired test format
If a paired-sample experiment is used, the test statistic is computed using Equation 15
t-Test Statistic for Paired-Sample Test
t d s n
df n
d d
s d ee standard deviation for paired differencess
Number of paired values in t
(d d)
n n
2
1h
he sample
Trang 32
Skill Development
10-36 The following dependent samples were randomly
selected Use the sample data to construct a 95%
confidence interval estimate for the population mean
10-37 The following paired sample data have been obtained
from normally distributed populations Construct a
90% confidence interval estimate for the mean paired difference between the two population means
Sample # Population 1 Population 2
d
4 6
0 2516
73
6 12 17.The standard deviation for the paired differences is
t d s n
d d
12 17 0 0
18 026
1 6543
Because t = 1.6543 6 t0.01 = 3.3649, do not reject the null hypothesis
Based on these sample data, there is insufficient evidence to conclude that name-brand ink cartridges produce more pages on average than generic brands
> END EXAMPLE
TRY PROBLEM 39
Trang 33
10-42 Consider the following set of samples obtained from
two normally distributed populations whose variances are equal:
Sample 1: 11.2 11.2 7.4 8.7 8.5 13.5 4.5 11.9 Sample 2: 11.7 9.5 15.6 16.5 11.3 17.6 17.0 8.5
a Suppose that the samples were independent
Perform a test of hypothesis to determine if there
is a difference in the two population means Use a significance level of 0.05
b Now suppose that the samples were paired samples Perform a test of hypothesis to determine if there is
a difference in the two population means
c How do you account for the difference in the outcomes of part a and part b? Support your assertions with a statistical rationale
10-43 One of the advances that helped to diminish carpal
tunnel syndrome is ergonomic keyboards The ergonomic keyboards may also increase typing speed Ten administrative assistants were chosen to type on both standard and ergonomic keyboards The resulting word-per-minute typing speeds follow:
per minute attained while typing Use a p-value
approach with a significance level of 0.01
10-44 Production engineers at Sinotron believe that a modified
layout on its assembly lines might increase average worker productivity (measured in the number of units produced per hour) However, before the engineers are ready to install the revised layout officially across the entire firm’s production lines, they would like to study the modified line’s effects on output The following data represent the average hourly production output of 12 randomly sampled employees before and after the line was modified:
10-45 The United Way raises money for community charity
activities Recently, in one community, the fundraising committee was concerned about whether there is a difference in the proportion of employees who give to United Way depending on whether the employer is a private business or a government agency A random
a Construct and interpret a 99% confidence interval
estimate for the paired difference in mean values
b Construct and interpret a 90% confidence interval
estimate for the paired difference in mean values
10-39 The following sample data have been collected from
a paired sample from two populations The claim is
that the first population mean will be at least as large
as the mean of the second population This claim will
be assumed to be true unless the data strongly suggest
b Based on the sample data, what should you
conclude about the null hypothesis? Test using
a = 0.10
c Calculate a 90% confidence interval for the
difference in the population means Are the results
from the confidence interval consistent with the
outcome of your hypothesis test? Explain why or
why not
10-40 A paired sample study has been conducted to
determine whether two populations have equal
means Twenty paired samples were obtained with the
following sample results:
d 12 45 s d 11 0
Based on these sample data and a significance level
of 0.05, what conclusion should be made about the
population means?
10-41 The following samples are observations taken from the
same elements at two different times:
Unit Sample 1 Sample 2
a Assume that the populations are normally
distributed and construct a 90% confidence interval
for the difference in the means of the distribution at
the times in which the samples were taken
b Perform a test of hypothesis to determine if the
difference in the means of the distribution at the
first time period is 10 units larger than at the
second time period Use a level of significance
equal to 0.10
Trang 34
a Discuss the appropriateness of the way this study was designed and conducted Why didn’t the consumer group select two samples with different drivers in each and have one group use the acetone and the other group not use it? Discuss.
b Using a significance level of 0.05, what conclusion should be reached based on these sample data? Discuss
10-47 An article in The American Statistician (M L R
Ernst, et al., “Scatterplots for Unordered Pairs,” 50 (1996), pp 260–265) reports on the difference in the measurements by two evaluators of the cardiac output
of 23 patients using Doppler echocardiography Both observers took measurements from the same patients The measured outcomes were as follows:
Patient 1 2 3 4 5 6 7 8 9 10 11 12
Evaluator 1 4.8 5.6 6.0 6.4 6.5 6.6 6.8 7.0 7.0 7.2 7.4 7.6 Evaluator 2 5.8 6.1 7.7 7.8 7.6 8.1 8.0 8.21 6.6 8.1 9.5 9.6 Patient 13 14 15 16 17 18 19 20 21 22 23
Evaluator 1 7.7 7.7 8.2 8.2 8.3 8.5 9.3 10.2 10.4 10.6 11.4 Evaluator 2 8.5 9.5 9.1 10.0 9.1 10.8 11.5 11.5 11.2 11.5 12.0
a Conduct a hypothesis test to determine if the average cardiac outputs measured by the two evaluators differ Use a significance level of 0.02
b Calculate the standard error of the difference between the two average outputs assuming that the sampling was done independently Compare this with the standard error obtained in part a
10-48 A prime factor in the economic troubles that started
in 2008 was the end of the “housing bubble.” The file titled House contains data for a sample showing the
average and median housing prices for selected areas
in the country in November 2007 and November 2008 Assume the data can be viewed as samples of the relevant populations
a Discuss whether the two samples are independent or dependent
b Based on your answer to part a, calculate a 90% confidence interval for the difference between the means of the average and median selling prices for houses during November 2007
c Noting your answer to part b, would it be plausible
to assert that the mean of the average selling prices for houses during the November 2007 is more than the average of the median selling prices during this period? Support your assertions
d Using a p-value approach and a significance level of
0.05, conduct a hypothesis test to determine if the mean of the average selling prices for houses during November 2007 is more than $30,000 larger than the mean of the median selling prices during this period
10-49 A treadmill manufacturer has developed a new
machine with softer tread and better fans than its
sample of people who had been contacted about
contributing last year was selected Of those contacted,
70 worked for a private business and 50 worked for a
government agency For the 70 private-sector employees,
the mean contribution was $230.25 with a standard
deviation equal to $55.52 For the 50 government
employees in the sample, the mean and standard
deviation were $309.45 and $61.75, respectively
a Based on these sample data and a = 0.05, what should
be concluded? Be sure to show the decision rule
b Construct a 95% confidence interval for the
difference between the mean contributions of
private business and government agency employees
who contribute to United Way Do the hypothesis
test and the confidence interval produce compatible
results? Explain and give reasons for your answer
10-46 An article on the PureEnergySystems.com website
written by Louis LaPoint discusses a product called
acetone The article stated that “Acetone 1CH3COCH32
is a product that can be purchased inexpensively in
most locations around the world, such as in common
hardware, auto parts, or drug stores Added to the fuel
tank in tiny amounts, acetone aids in the vaporization
of the gasoline or diesel, increasing fuel efficiency,
engine longevity, and performance—as well as reducing
hydrocarbon emissions.” To test whether this product
actually does increase fuel efficiency in passenger cars,
a consumer group has randomly selected 10 people to
participate in the study The following procedure is used:
1 People are to bring their cars into a specified
gasoline station and have the car filled with regular,
unleaded gasoline at a particular pump Nothing
extra is added to the gasoline at this fill-up The
car’s odometer is recorded at the time of fill-up
2 When the tank is nearly empty, the person is to bring
the car to the same gasoline station and pump and
have it refilled with gasoline The odometer is read
again and the miles per gallon are recorded This time,
a prescribed quantity of acetone is added to the fuel
3 When the tank is nearly empty, the person is to
bring the car back to the same station and pump to
have it filled The miles per gallon will be recorded
Each person is provided with free tanks of gasoline and
asked to drive his or her car normally
The following miles per gallon (mpg) were recorded:
Driver No Additive MPG: Acetone Added MPG:
Trang 35excluding food, gifts, and news items A file titled
Revenues contains sample data selected from airport
retailers in 2005 and again in 2008
a Conduct a hypothesis test to determine if the average amount of retail spending by air travelers has increased as least as much as approximately
$0.10 a year from 2005 to 2008 Use a significance level of 0.025
b Using the appropriate analysis (that of part a or other appropriate methodology), substantiate the statement that average retail purchases in airports increased over the time period between 2005 and
2008 Support your assertions
c Parts a and b give what seems to be a mixed message Is there a way to determine what values are plausible for the difference between the average revenue in 2005 and 2008? If so, conduct the appropriate procedure
current model The manufacturer believes these
new features will enable runners to run for longer
times than they can on its current machines To
determine whether the desired result is achieved, the
manufacturer randomly sampled 35 runners Each
runner was measured for one week on the current
machine and for one week on the new machine The
weekly total number of minutes for each runner on
the two types of machines was collected The results
are contained in the file Treadmill At the 0.02
level of significance, can the treadmill manufacturer
conclude that the new machine has the desired
result?
10-50 As the number of air travelers with time on their
hands increases, it would seem that spending on
retail purchases in airports would increase as well
A study by Airport Revenue News addressed the
per-person spending at selected airports for merchandise,
END EXERCISES 10-3
Two Population Proportions
The previous sections illustrated the methods for estimating and testing hypotheses ing two population means There are many business situations in which these methods can be applied However, there are other instances involving two populations in which the measures of interest are not the population means This section extends the methodology for testing hypoth-eses involving a single population proportion to tests involving hypotheses about the differ-ence between two population proportions First, we will look at a confidence interval estimation involving two population proportions
involv-Estimating the Difference between Two Population Proportions
BUSINESS APPLICATION ESTIMATING THE DIFFERENCE BETWEEN TWO
POPULATION PROPORTIONS
BICYCLE DESIGN Recently, an outdoor magazine conducted an interesting study
involving a prototype bicycle that was made by a Swiss manufacturer The prototype had no identification on it to indicate the name of the manufacturer Of interest was the difference in the proportion of men versus women who would rate the bicycle as high quality
Obviously, there was no way to gauge the attitudes of the entire population of men and women who could eventually judge the quality of the bicycle Instead, the reporter for the magazine asked a random sample of 425 men and 370 women to rate the bicycle’s quality In the
results that follow, the variable x indicates the number in the sample who said the bicycle was
Trang 36The point estimate for the difference in population proportions is
propor-A rule of thumb for “sufficiently large” is that np and n 11- p2 are greater than or equal to 5
for each sample
Confidence Interval Estimate for p1 2 p2
p1 = Sample proportion from population 1
p2 = Sample proportion from population 2
z = Critical value from the standard normal table
The analysts can substitute the sample results into Equation 16 to establish a 95% dence interval estimate, as follows:
confi-( 0 565 0 530 ) 1 96 0 565 1 0 565. ( . ) . (
425
0 530 1 0 530370
0 035 0 069
)
0 034 (p1 p2) 0 104.Thus, based on the sample data and using a 95% confidence interval, the analysts estimate that the true difference in proportion of males versus females who rate the prototype as high quality is between -0.034 and 0.104 At one extreme, 3.4% more females rate the bicycle as high in quality At the other extreme, 10.4% more males rate the bicycle as high quality than females Because zero is included in the interval, there may be no difference between the proportion of males and females who rate the prototype as high quality based on these data Consequently, the reporter is not able to conclude that one group or the other would be more likely to rate the prototype bicycle high in quality
Hypothesis Tests for the Difference between Two Population Proportions
BUSINESS APPLICATION TESTING FOR THE DIFFERENCE BETWEEN TWO
POPULATION PROPORTIONS
POMONA FABRICATIONS Pomona Fabrications, Inc produces handheld hair dryers
that several major retailers sell as in-house brands A critical component of a handheld hair dryer is the motor-heater unit, which accounts for most of the dryer’s cost and for most
of the product’s reliability problems Product reliability is important to Pomona because the company offers a one-year warranty Of course, Pomona is also interested in reducing production costs
Pomona’s R&D department has recently created a new motor-heater unit with fewer parts than the current unit, which would lead to a 15% cost savings per hair dryer However, the company’s vice president of product development is unwilling to authorize the new component unless it is more reliable than the current motor-heater
The R&D department has decided to test samples of both units to see which motor-heater
is more reliable Of each type, 250 will be tested under conditions that simulate one year’s
Excel Tutorial
Excel
tutorials
Trang 37
use, and the proportion of each type that fails within that time will be recorded This leads to the formulation of the following null and alternative hypotheses:
H0: p1 - p2 Ú 0.0 H0: p1 Ú p2
or
H A : p1 - p2 6 0.0 H A : p1 6 p2
where:
p1 = Population proportion of new dryer type that fails in simulated one - year period
p2 = Population proportion of existing dryer type that fails in simulated one - year period
The null hypothesis states that the new motor-heater is no better than the old, or rent, motor-heater The alternative states that the new unit has a smaller proportion of failures within one year than the current unit In other words, the alternative states that the new unit is more reliable The company wants clear evidence before changing units If the null hypothesis
cur-is rejected, the company will conclude that the new motor-heater unit cur-is more reliable than the old unit and should be used in producing the hair dryers To test the null hypothesis, we can use the test statistic approach
The test statistic is based on the sampling distribution of p1- p2 We showed that when
np Ú 5 and n11 - p2 Ú 5, the sampling distribution of the sample proportion is approximately
normally distributed, with a mean equal to p and a variance equal to p 11 - p2>n.
Likewise, in the two-sample case, the sampling distribution of p1 - p2 will also be approximately normal if
Assumptions
n1p1Ú 5, n111−p12 Ú 5, and n2p2Ú 5, n211−p2 2 Ú 5
Because p1 and p2 are unknown, we substitute the sample proportions, p1 and p2, to determine whether the sample size requirements are satisfied
The mean of the sampling distribution of p1 - p2 is the difference of the population
proportions, p1 - p2 The variance is, however, the sum of the variances, p111 - p12>n1 +
p211 - p22>n2 Because the test is conducted using the assumption that the null hypothesis is
true, we assume that p1 = p2 = p and estimate their common value, p, using a pooled estimate,
as shown in Equation 17 The z-test statistic for the difference between two proportions is given
x1 and x2 = Number from samples 1 and 2 with the characteristic of interest
z-Test Statistic for Difference between Population Proportions
z p p p p
p p
n n
( 1 2) ( 1 2)1
p
p p p
respeectivelyPooled estimator for the overall
p proportion for both populations combined
Trang 38
FIGURE 9 |
Hypothesis Test of Two
Population Proportions for
in the two samples, and the denominator is the total sample size Again, the pooled
estima-tor, p, is used when the null hypothesis is that there is no difference between the population
proportions
Assume that Pomona is willing to use a significance level of 0.05 and that 55 of the new motor-heaters and 75 of the originals failed the one-year test Figure 9 illustrates the decision-rule development and the hypothesis test As you can see, Pomona should reject the null hypothesis based on the sample data Thus, the firm should conclude that the new motor-heater is more reliable than the old one Because the new one is also less costly, the company should now use the new unit in the production of hair dryers
The p-value approach to hypothesis testing could also have been used to test Pomona’s hypothesis In this case, the calculated value of the test statistic, z = -2.04, results in a
p-value of 0.0207 10.5 - 0.47932 from the standard normal table Because this p-value is
smaller than the significance level of 0.05, we would reject the null hypothesis Remember,
whenever your p-value is smaller than the alpha value, your sample contains evidence to
reject the null hypothesis
The PHStat add-in to Excel contains a procedure for performing hypothesis tests involving two population proportions Figure 10 shows the PHStat output for the Pomona
example The output contains both the z-test statistic and the p-value As we observed
from our manual calculations, the difference in sample proportions is sufficient to reject the null hypothesis that there is no difference in population proportions
Trang 39
EXAMPLE 8 HYPOTHESIS TEST FOR THE DIFFERENCE BETWEEN
TWO POPULATION PROPORTIONS
Transportation Security Administration Transportation Security Administration (TSA)
is responsible for transportation security at all United States airports The TSA is evaluating two suppliers of a scanning system it is considering purchasing Both scanners are designed
to detect forged IDs that might be used by passengers trying to board airlines High-quality scanners and printers and home computers have made forging IDs an increasing security risk The TSA is interested in determining whether there is a difference in the proportion of forged IDs detected by the two suppliers To conduct this test, use the following steps:
In this case, the population parameter of interest is the population proportion of detected forged IDs At issue is whether there is a difference between the two suppliers in terms of the proportion of forged IDs detected
FIGURE 10 |
Excel 2010 (PHStat) Output of
the Two Proportions Test for
7 Enter Number of Items
of Interest and Sample
Size for both populations.
8 Indicate Lower-tail test.
9 Click OK.
Minitab Instructions (for similar results):
1 Choose Stat >
Basic Statistics >
2 Proportions.
2 Choose Summarized data.
3 In First enter Trials and
Events for sample 1(e.g., 250 and 55)
4 In Second enter Trials
and Events for sample 2(e.g., 250 and 75)
5 Select Options,
Insert 1 – in
Confidence level.
6 In Alternative
select less than.
7 Check Use pooled estimate of p for test.
8 Click OK OK.
a
Trang 40
Step 2 Formulate the appropriate null and alternative hypotheses.
The null and alternative hypotheses are
H0: p1 - p2 = 0.0
H A : p1 - p2 ≠ 0.0
The test will be conducted using an a = 0.02
For a two-tailed test, the critical values for each side of the distribution are
- z0.01 = - 2.33 and z0.01 = 2.33
The decision rule based on the z-test statistic is
If z 6 -2.33 or z 7 2.33, reject the null hypothesis;
Otherwise, do not reject the null hypothesis
rule.
Two hundred known forged IDs will be randomly selected from a large population of previously confiscated IDs and scanned by systems from each supplier For supplier 1, 186 forgeries are detected, and for supplier 2, 168 are detected The sample proportions are
p x n
x n
2 2
2 8211
Because z = 2.8211 7 z0.01 = 2.33, reject the null hypothesis
The difference between the two sample proportions provides sufficient evidence to allow us to conclude a difference exists between the two suppliers The TSA can infer that supplier 1 provides the better scanner for this purpose
> END EXAMPLE
TRY PROBLEM 54