We now consider the case of hypothesis testing on the mean of a population with unknown variance2. The situation is analogous to Section 8-2, where we considered a confidence interval on the mean for the same situation. As in that section, the validity of the test proce- dure we will describe rests on the assumption that the population distribution is at least approximately normal. The important result upon which the test procedure relies is that if X1, X2, p, Xnis a random sample from a normal distribution with mean and variance 2, the random variable
has a t distribution with n 1 degrees of freedom. Recall that we used this result in Section 8-2 to devise the t-confidence interval for . Now consider testing the hypotheses
We will use the test statistic:
H1: 0
H0: 0
T X S1n
(9-26) T0X 0
S1n
Test Statistic
JWCL232_c09_283-350.qxd 1/14/10 3:07 PM Page 310
the two tails of the t distribution. Refer to Fig. 9-10(a). The P-value is the probability above |t0| plus the probability below ⫺|t0|. Because the t distribution is symmetric around zero, a simple way to write this is
(9-27) A small P-value is evidence against H0, so if P is of sufficiently small value (typically⬍0.05), reject the null hypothesis.
For the one-sided alternative hypotheses
(9-28) we calculate the test statistic t0from Equation 9-26 and calculate the P-value as
(9-29) For the other one-sided alternative
(9-30) we calculate the P-value as
(9-31) Figure 9-10(b) and (c) show how these P-values are calculated.
Statistics software packages calculate and display P-values. However, in working problems by hand, it is useful to be able to find the P-value for a t-test. Because the t-table in Appendix A Table II contains only 10 critical values for each t distribution, determining the exact P-value from this table is usually impossible. Fortunately, it’s easy to find lower and upper bounds on the P-value by using this table.
To illustrate, suppose that we are conducting an upper-tailed t-test (so H1: > 0) with 14 degrees of freedom. The relevant critical values from Appendix A Table II are as fol- lows:
Critical Value: 0.258 0.692 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140 Tail Area: 0.40 0.25 0.10 0.05 0.025 0.01 0.005 0.0025 0.001 0.0005 After calculating the test statistic, we find that t0⫽2.8. Now, t0⫽2.8 is between two tabulated values, 2.624 and 2.977. Therefore, the P-value must be between 0.01 and 0.005. Refer to Fig. 9-11. These are effectively the upper and lower bounds on the P-value.
This illustrates the procedure for an upper-tailed test. If the test is lower-tailed, just change the sign on the lower and upper bounds for t0and proceed as above. Remember that for a
P⫽P1Tn⫺1⬍t02 H1: ⬍ 0
H0: ⫽ 0
P⫽P1Tn⫺1⬎t02 H1: ⬎ 0
H0: ⫽ 0
P⫽2P1Tn⫺1⬎冟t0冟2
9-3 TESTS ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE UNKNOWN 311
Figure 9-10 Calculating the P-value for a t-test: (a) H1: Z0:(b) H1: ⬎ 0;(c) H1: ⬍ 0.
(a)
–t0 0 t0 t0 t0
(c) 0 (b)
0 P-value =
probability in both tails
Two-tailed test One-tailed test One-tailed test
P-value
tn – 1 tn – 1 tn – 1
312 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE
two-tailed test, the level of significance associated with a particular critical value is twice the cor- responding tail area in the column heading. This consideration must be taken into account when we compute the bound on the P-value. For example, suppose that t02.8 for a two-tailed al- ternative based on 14 degrees of freedom. The value of the test statistic t02.624 (correspond- ing to 2 0.01 0.02) and t02.977 (corresponding to 2 0.005 0.01), so the lower and upper bounds on the P-value would be 0.01 P0.02 for this case.
Some statistics software packages can help you calculate P-values. For example, Minitab has the capability to find cumulative probabilities from many standard probability distributions, including the t distribution. Simply enter the value of the test statistic t0along with the appro- priate number of degrees of freedom. Minitab will display the probability P(Tt0) where is the degrees of freedom for the test statistic t0. From the cumulative probability, the P-value can be determined.
The single-sample t-test we have just described can also be conducted using the fixed sig- nificance levelapproach. Consider the two-sided alternative hypothesis. The null hypothesis would be rejected if the value of the test statistic t0falls in the critical region defined by the lower and upper /2 percentage points of the t distribution with n 1 degrees of freedom. That is, reject H0if
For the one-tailed tests, the location of the critical region is determined by the direction that the inequality in the alternative hypothesis “points.” So if the alternative is H1: 0, reject H0if
and if the alternative is H1: 0, reject H0if
Figure 9-12 shows the locations of these critical regions.
t0 t,n 1
t0t,n 1
t0t/2,n 1 or t0 t/2,n 1
Figure 9-11 P-value for t0 2.8; an upper-tailed test is shown to be between 0.005 and 0.01.
0
t distribution with 14 degrees of freedom
t0 = 2.8 2.624
2.977
P(T14 > 2.624)= 0.01 P(T14 > 2.977)= 0.005
Figure 9-12 The distribution of T0 when H0: 0 is true, with critical region for (a) (b) and (c) H1: 0.
H1: 0, H1: Z0,
(a) 0
tn – 1
–t /2, nα – 1 t /2, nα – 1 t , nα – 1 –t , nα – 1 T0
/2α /2α
(c) 0 α
(b) 0
α
tn – 1 tn – 1
JWCL232_c09_283-350.qxd 1/14/10 8:45 PM Page 312
9-3 TESTS ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE UNKNOWN 313
Testing Hypotheses on the Mean of a Normal Distribution, Variance Unknown
Null hypothesis: H0: 0
Test statistic:
Rejection Criterion Alternative Hypotheses P-Value for Fixed-Level Tests
Probability above |t0| and probability below |t0|
Probability above t0 Probability below t0
The calculations of the P-values and the locations of the critical regions for these situations are shown in Figs. 9-10 and 9-12, respectively.
t0 t,n1 H1: 0
t0t,n1 H1: 0
t0t/2,n1 or t0 t/2,n1 H1: Z0
T0X 0
S1n
Summary for the One- Sample t-Test
EXAMPLE 9-6 Golf Club Design
The increased availability of light materials with high strength has revolutionized the design and manufacture of golf clubs, particularly drivers. Clubs with hollow heads and very thin faces can result in much longer tee shots, especially for players of modest skills. This is due partly to the “spring- like effect” that the thin face imparts to the ball. Firing a golf ball at the head of the club and measuring the ratio of the out- going velocity of the ball to the incoming velocity can quan- tify this spring-like effect. The ratio of velocities is called the coefficient of restitution of the club. An experiment was per- formed in which 15 drivers produced by a particular club maker were selected at random and their coefficients of resti- tution measured. In the experiment the golf balls were fired from an air cannon so that the incoming velocity and spin rate of the ball could be precisely controlled. It is of in- terest to determine if there is evidence (with 0.05) to sup- port a claim that the mean coefficient of restitution exceeds 0.82. The observations follow:
0.8411 0.8191 0.8182 0.8125 0.8750 0.8580 0.8532 0.8483 0.8276 0.7983 0.8042 0.8730 0.8282 0.8359 0.8660 The sample mean and sample standard deviation are
and s 0.02456. The normal probability plot of the data in Fig. 9-13 supports the assumption that the coeffi- cient of restitution is normally distributed. Since the objective of the experimenter is to demonstrate that the mean coefficient of restitution exceeds 0.82, a one-sided alternative hypothesis is appropriate.
x0.83725
The solution using the seven-step procedure for hypothe- sis testing is as follows:
1. Parameter of interest: The parameter of interest is the mean coefficient of restitution, .
2. Null hypothesis: H0: 0.82
3. Alternative hypothesis: . We want to reject H0if the mean coefficient of restitution exceeds 0.82.
4. Test Statistic: The test statistic is
5. Reject H0if : Reject H0if the P-value is less than 0.05.
6. Computations: Since 0.83725, s 0.02456, 0 0.82, and n 15, we have
7. Conclusions: From Appendix A Table II we find, for a t distribution with 14 degrees of freedom, that t02.72 falls between two values: 2.624, for which 0.01, and 2.977, for which 0.005. Because this is a one-tailed test, we know that the P-value is between those two values, that is, 0.005 P 0.01.
Therefore, since P0.05, we reject H0and conclude that the mean coefficient of restitution exceeds 0.82.
To use Minitab to compute the P-value, use the Calc t00.837250.82
0.02456115 2.72
x t0x 0
s1n
H1: 0.82
314 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE
menu and select the probability distribution option.
Then, for the t distribution, enter 14 degrees of free- dom and the value of the test statistic t02.72 as the input constant. Minitab returns the probability P (T14
Figure 9-13. Normal probability plot of the coefficient of restitution data from Example 9-6.
1 0.78 5 10 20 30 40 50 60 70 80 90 95 99
Percentage
0.83 0.88
Coefficient of restitution
Minitab Computations One-Sample T: COR
Test of mu 0.82 vs mu0.82
Variable N Mean StDev SE Mean
COR 15 0.83725 0.02456 0.00634
Variable 95.0% Lower Bound T P
COR 0.82608 2.72 0.008
Notice that Minitab computes both the test statistic T0and a 95% lower confidence bound for the coefficient of restitution. The reported P-value is 0.008. Because the 95% lower confidence bound exceeds 0.82, we would reject the hypothesis that H0: 0.82 and conclude that the alternative hypothesis is true.
9-3.2 Type II Error and Choice of Sample Size
The type II error probability for the t-test depends on the distribution of the test statistic in Equation 9-26 when the null hypothesis H0: 0is false. When the true value of the mean is 0 , the distribution for T0is called the noncentral t distribution with n 1
H1: 0.82
2.72) 0.991703. The P-value is P(T14 2.72) or P1 P(T142.72) 1 0.991703 0.008297.
Practical Interpretation: There is strong evidence to con- clude that the mean coefficient of restitution exceeds 0.82.
Minitab will conduct the one-sample t-test. The output from this software package is in the following display:
JWCL232_c09_283-350.qxd 1/18/10 11:14 AM Page 314
9-3 TESTS ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE UNKNOWN 315 degrees of freedom and noncentrality parameter . Note that if 0, the noncentral t distribution reduces to the usual central t distribution. Therefore, the type II error of the two- sided alternative (for example) would be
where denotes the noncentral t random variable. Finding the type II error probability for the t-test involves finding the probability contained between two points of the noncentral t distribution. Because the noncentral t-random variable has a messy density function, this integration must be done numerically.
Fortunately, this ugly task has already been done, and the results are summarized in a se- ries of O.C. curves in Appendix Charts VIIe, VIIf, VIIg, and VIIh that plot for the t-test against a parameter d for various sample sizes n. Curves are provided for two-sided alterna- tives on Charts VIIe and VIIf. The abscissa scale factor d on these charts is defined as
(9-32) For the one-sided alternative or , we use charts VIG and VIH with
(9-33) We note that d depends on the unknown parameter 2. We can avoid this difficulty in several ways. In some cases, we may use the results of a previous experiment or prior information to make a rough initial estimate of 2. If we are interested in evaluating test per- formance after the data have been collected, we could use the sample variance s2to estimate 2. If there is no previous experience on which to draw in estimating 2, we then define the difference in the mean d that we wish to detect relative to . For example, if we wish to detect a small difference in the mean, we might use a value of (for example), whereas if we are interested in detecting only moderately large differences in the mean, we might select (for example). That is, it is the value of the ratio that is important in determining sample size, and if it is possible to specify the relative size of the difference in means that we are interested in detecting, then a proper value of d can usually be selected.
00
d 00 2
d 00 1
d 0 00 00
0
0
d 0 00 00
T¿0
PP55tt2,n2,n11TT0¿0tt2,n2,n1160 06 1n
EXAMPLE 9-7 Golf Club Design Sample Size Consider the golf club testing problem from Example 9-6. If the mean coefficient of restitution exceeds 0.82 by as much as 0.02, is the sample size n 15 adequate to ensure that H0: 0.82 will be rejected with probability at least 0.8?
To solve this problem, we will use the sample standard deviation s 0.02456 to estimate . Then
. By referring to the operating charac-
0.020.024560.81 d 00
teristic curves in Appendix Chart VIIg (for 0.05) with d 0.81 and n 15, we find that 0.10, approximately.
Thus, the probability of rejecting H0: 0.82 if the true mean exceeds this by 0.02 is approximately 1 1 0.10 0.90, and we conclude that a sample size of n15 is adequate to provide the desired sensitivity.
Minitab will also perform power and sample size computations for the one-sample t-test.
Below are several calculations based on the golf club testing problem:
316 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE
Minitab Computations
Power and Sample Size 1-Sample t Test
Testing meannull (versusnull)
Calculating power for mean null difference Alpha 0.05 Sigma 0.02456
Sample
Difference Size Power
0.02 15 0.9117
Power and Sample Size 1-Sample t Test
Testing mean null (versusnull)
Calculating power for mean null difference Alpha 0.05 Sigma 0.02456
Sample
Difference Size Power
0.01 15 0.4425
Power and Sample Size 1-Sample t Test
Testing mean null (versusnull)
Calculating power for mean null difference Alpha 0.05 Sigma 0.02456
Sample Target Actual
Difference Size Power Power
0.01 39 0.8000 0.8029
In the first portion of the computer output, Minitab reproduces the solution to Example 9-7, verifying that a sample size of n 15 is adequate to give power of at least 0.8 if the mean co- efficient of restitution exceeds 0.82 by at least 0.02. In the middle section of the output, we used Minitab to compute the power to detect a difference between and of 0.01. Notice that with n 15, the power drops considerably to 0.4425. The final portion of the output is the sample size required for a power of at least 0.8 if the difference between and 0of interest is actually 0.01. A much larger n is required to detect this smaller difference.
00.82
EXERCISES FOR SECTION 9-3
9-48. A hypothesis will be used to test that a population mean equals 7 against the alternative that the population mean does not equal 7 with unknown variance . What are the criti- cal values for the test statistic T0for the following significance levels and sample sizes?
(a) and
(b) and
(c) 0.10and n15 n12 0.05
n20 0.01
9-49. A hypothesis will be used to test that a population mean equals 10 against the alternative that the population mean is greater than 10 with known variance . What is the critical value for the test statistic Z0for the following signifi- cance levels?
(a) and
(b) and
(c) 0.10and n15 n12 0.05
n20 0.01
JWCL232_c09_283-350.qxd 1/14/10 3:07 PM Page 316
9-3 TESTS ON THE MEAN OF A NORMAL DISTRIBUTION, VARIANCE UNKNOWN 317
9-50. A hypothesis will be used to test that a population mean equals 5 against the alternative that the population mean is less than 5 with known variance . What is the criti- cal value for the test statistic Z0 for the following signifi- cance levels?
(a) and
(b) and
(c) and
9-51. For the hypothesis test H0: against H1: with variance unknown and , approximate the P-value for each of the following test statistics.
(a) (b) (c)
9-52. For the hypothesis test H0: against H1: with variance unknown and , approximate the P-value for each of the following test statistics.
(a) (b) (c)
9-53. For the hypothesis test H0: against H1: with variance unknown and , approximate the P-value for each of the following test statistics.
(a) (b) (c)
9-54. Consider the computer output below.
One-Sample T:
Test of mu 91 vs 91
95% Lower Variable N Mean StDev SE Mean Bound T P
x 20 92.379 0.717 ? ? ? ?
(a) Fill in the missing values. You may calculate bounds on the P-value. What conclusions would you draw?
(b) Is this a one-sided or a two-sided test?
(c) If the hypothesis had been H0: 90 versus H1: > 90, would your conclusions change?
9-55. Consider the computer output below.
One-Sample T:
Test of mu 12 vs not 12
Variable N Mean StDev SE Mean T P
x 10 12.564 ? 0.296 ? ?
(a) How many degrees of freedom are there on the t-test statistic?
(b) Fill in the missing values. You may calculate bounds on the P-value. What conclusions would you draw?
(c) Is this a one-sided or a two-sided test?
(d) Construct a 95% two-sided CI on the mean.
(e) If the hypothesis had been H0: 12 versus H1: 12, would your conclusions change?
(f) If the hypothesis had been H0: 11.5, versus , would your conclusions change? Answer this question by using the CI computed in part (d).
H1: 11.5
t00.4 t0 1.84
t02.05
n12
5 5
t00.4 t0 1.84
t02.05
n15
10 10
t00.4 t0 1.84
t02.05
n20
7 7
n15 0.10
n12 0.05
n20 0.01
9-56. Consider the computer output below.
One-Sample T:
Test of mu 34 vs not 34
Variable N Mean StDev SE Mean 95% CI T P
x 16 35.274 1.783 ? (34.324, 36.224) ? 0.012 (a) How many degrees of freedom are there on the t-test statistic?
(b) Fill in the missing quantities.
(c) At what level of significance can the null hypothesis be rejected?
(d) If the hypothesis had been H0: 34 versus H1: 34, would the P-value have been larger or smaller?
(e) If the hypothesis had been H0: 34.5 versus , would you have rejected the null hypothesis at the 0.05 level?
9-57. An article in Growth: A Journal Devoted to Problems of Normal and Abnormal Growth [“Comparison of Measured and Estimated Fat-Free Weight, Fat, Potassium and Nitrogen of Growing Guinea Pigs” (Vol. 46, No. 4, 1982, pp. 306–321)]
reported the results of a study that measured the body weight (in grams) for guinea pigs at birth.
421.0 452.6 456.1 494.6 373.8
90.5 110.7 96.4 81.7 102.4
241.0 296.0 317.0 290.9 256.5
447.8 687.6 705.7 879.0 88.8
296.0 273.0 268.0 227.5 279.3
258.5 296.0
(a) Test the hypothesis that mean body weight is 300 grams.
Use 0.05.
(b) What is the smallest level of significance at which you would be willing to reject the null hypothesis?
(c) Explain how you could answer the question in part (a) with a two-sided confidence interval on mean body weight.
9-58. An article in the ASCE Journal of Energy Engineering (1999, Vol. 125, pp. 59–75) describes a study of the thermal inertia properties of autoclaved aerated con- crete used as a building material. Five samples of the mate- rial were tested in a structure, and the average interior tem- peratures (°C) reported were as follows: 23.01, 22.22, 22.04, 22.62, and 22.59.
(a) Test the hypotheses H0: 22.5 versus H1: 22.5, using 0.05. Find the P-value.
(b) Check the assumption that interior temperature is nor- mally distributed.
(c) Compute the power of the test if the true mean interior temperature is as high as 22.75.
(d) What sample size would be required to detect a true mean interior temperature as high as 22.75 if we wanted the power of the test to be at least 0.9?
H1: 34.5
318 CHAPTER 9 TESTS OF HYPOTHESES FOR A SINGLE SAMPLE (e) Explain how the question in part (a) could be answered by
constructing a two-sided confidence interval on the mean interior temperature.
9-59. A 1992 article in the Journal of the American Medical Association (“A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body Temperature, and Other Legacies of Carl Reinhold August Wunderlich”) reported body temperature, gender, and heart rate for a number of subjects. The body tem- peratures for 25 female subjects follow: 97.8, 97.2, 97.4, 97.6, 97.8, 97.9, 98.0, 98.0, 98.0, 98.1, 98.2, 98.3, 98.3, 98.4, 98.4, 98.4, 98.5, 98.6, 98.6, 98.7, 98.8, 98.8, 98.9, 98.9, and 99.0.
(a) Test the hypothesis H0: 98.6 versus , using 0.05. Find the P-value.
(b) Check the assumption that female body temperature is normally distributed.
(c) Compute the power of the test if the true mean female body temperature is as low as 98.0.
(d) What sample size would be required to detect a true mean female body temperature as low as 98.2 if we wanted the power of the test to be at least 0.9?
(e) Explain how the question in part (a) could be answered by constructing a two-sided confidence interval on the mean female body temperature.
9-60. Cloud seeding has been studied for many decades as a weather modification procedure (for an interesting study of this subject, see the article in Technometrics, “A Bayesian Analysis of a Multiplicative Treatment Effect in Weather Modification,” Vol. 17, pp. 161–166). The rainfall in acre-feet from 20 clouds that were selected at random and seeded with silver nitrate follows: 18.0, 30.7, 19.8, 27.1, 22.3, 18.8, 31.8, 23.4, 21.2, 27.9, 31.9, 27.1, 25.0, 24.7, 26.9, 21.8, 29.2, 34.8, 26.7, and 31.6.
(a) Can you support a claim that mean rainfall from seeded clouds exceeds 25 acre-feet? Use 0.01. Find the P-value.
(b) Check that rainfall is normally distributed.
(c) Compute the power of the test if the true mean rainfall is 27 acre-feet.
(d) What sample size would be required to detect a true mean rainfall of 27.5 acre-feet if we wanted the power of the test to be at least 0.9?
(e) Explain how the question in part (a) could be answered by constructing a one-sided confidence bound on the mean diameter.
9-61. The sodium content of twenty 300-gram boxes of organic cornflakes was determined. The data (in milligrams) are as follows: 131.15, 130.69, 130.91, 129.54, 129.64, 128.77, 130.72, 128.33, 128.24, 129.65, 130.14, 129.29, 128.71, 129.00, 129.39, 130.42, 129.53, 130.12, 129.78, 130.92.
(a) Can you support a claim that mean sodium content of this brand of cornflakes differs from 130 milligrams? Use 0.05. Find the P-value.
(b) Check that sodium content is normally distributed.
(c) Compute the power of the test if the true mean sodium content is 130.5 milligrams.
H1: 98.6
(d) What sample size would be required to detect a true mean sodium content of 130.1 milligrams if we wanted the power of the test to be at least 0.75?
(e) Explain how the question in part (a) could be answered by constructing a two-sided confidence interval on the mean sodium content.
9-62. Consider the baseball coefficient of restitution data first presented in Exercise 8-92.
(a) Do the data support the claim that the mean coefficient of restitution of baseballs exceeds 0.635? Use 0.05. Find the P-value.
(b) Check the normality assumption.
(c) Compute the power of the test if the true mean coefficient of restitution is as high as 0.64.
(d) What sample size would be required to detect a true mean coefficient of restitution as high as 0.64 if we wanted the power of the test to be at least 0.75?
(e) Explain how the question in part (a) could be answered with a confidence interval.
9-63. Consider the dissolved oxygen concentration at TVA dams first presented in Exercise 8-94.
(a) Test the hypothesis H0: 4 versus . Use 0.01. Find the P-value.
(b) Check the normality assumption.
(c) Compute the power of the test if the true mean dissolved oxygen concentration is as low as 3.
(d) What sample size would be required to detect a true mean dissolved oxygen concentration as low as 2.5 if we wanted the power of the test to be at least 0.9?
(e) Explain how the question in part (a) could be answered with a confidence interval.
9-64. Reconsider the data from Medicine and Science in Sports and Exercise described in Exercise 8-30. The sample size was seven and the sample mean and sample standard deviation were 315 watts and 16 watts, respectively.
(a) Is there evidence that leg strength exceeds 300 watts at significance level 0.05? Find the P-value.
(b) Compute the power of the test if the true strength is 305 watts.
(c) What sample size would be required to detect a true mean of 305 watts if the power of the test should be at least 0.90?
(d) Explain how the question in part (a) could be answered with a confidence interval.
9-65. Reconsider the tire testing experiment described in Exercise 8-27.
(a) The engineer would like to demonstrate that the mean life of this new tire is in excess of 60,000 kilometers. Formu- late and test appropriate hypotheses, and draw conclu- sions using 0.05.
(b) Suppose that if the mean life is as long as 61,000 kilome- ters, the engineer would like to detect this difference with probability at least 0.90. Was the sample size n 16 used in part (a) adequate?
H1: 4 JWCL232_c09_283-350.qxd 1/14/10 3:07 PM Page 318