Using Excel or Minitab, the p-value corresponding to 2... H0 = The column variable is independent of the row variable Ha = The column variable is not independent of the row variable Usi
Trang 1Tests of Goodness of Fit and
Independence
Learning Objectives
1 Know how to conduct a goodness of fit test
2 Know how to use sample data to test for independence of two variables
3 Understand the role of the chi-square distribution in conducting tests of goodness of fit and
independence
4 Be able to conduct a goodness of fit test for cases where the population is hypothesized to have
either a multinomial, a Poisson, or a normal distribution
5 For a test of independence, be able to set up a contingency table, determine the observed and expected
frequencies, and determine if the two variables are independent
6 Be able to use p-values based on the chi-square distribution.
Trang 21 a Expected frequencies: e1 = 200 (.40) = 80, e2 = 200 (.40) = 80
e3 = 200 (.20) = 40Actual frequencies: f1 = 60, f2 = 120, f3 = 20
Trang 3 = 15.33 shows the p-value is less than 005
Using Excel or Minitab, the p-value corresponding to 2
Using the 2 table with df = 3,2= 6.87 shows the p-value is between 05 and 10.
Using Excel or Minitab, the p-value corresponding to 2
Trang 4 = 29.51 shows the p-value is less than 005.
Using Excel or Minitab, the p-value corresponding to 2
= 8.73 shows the p-value is greater than 10.
Using Excel or Minitab, the p-value corresponding to 2
= 8.73 is 1203
from the outlet preferences expressed in the U.S Shopper Database
6 a
Hypothesized Frequency Frequency
Trang 5Using the 2 table with df = 3,2= 12.21 shows the p-value is between 005 and 01
Using Excel or Minitab, the p-value corresponding to 2
= 12.21 is 0067
changed over the four year period
c 21% + 30% = 51% Over half of in-store purchases are made using plastic
7 Expected frequencies: 20% each n = 60
e1 = 12, e2 = 12, e3 = 12, e4 = 12, e5 = 12Actual frequencies: f1 = 5, f2 = 8, f3 = 15, f4 = 20, f5 = 12
Using the 2 table with df = 4,2= 11.50 shows the p-value is between 01 and 025.
Using Excel or Minitab, the p-value corresponding to 2
= 11.50 is 0215
p-value < 05; reject H0 Yes, the largest companies differ in performance from the 1000 companies
In general, the largest companies did not do as well as others 15 of 60 companies (25%) are in the middle group and 20 of 60 companies (33%) are in the next lower group These both are greater than the 20% expected Relative few large companies are in the top A and B categories
Trang 6Note that this result is for the year 2002 This should not be generalized to other years without additional data.
= 16.31 shows the p-value is less than 005
Using Excel or Minitab, the p-value corresponding to 2
= 16.31 is 0010
frequencies show telephone service is slightly better with more excellent and good ratings
9 H0 = The column variable is independent of the row variable
Ha = The column variable is not independent of the row variable
Trang 7Using the 2
table with df = 2, 2
= 7.86 shows the p-value is between 01 and 025
Using Excel or Minitab, the p-value corresponding to 2= 7.86 is 0196
variable
10 H0 = The column variable is independent of the row variable
Ha = The column variable is not independent of the row variable
Using the 2 table with df = 4,2= 19.77 shows the p-value is less than 005
Using Excel or Minitab, the p-value corresponding to 2
= 19.77 is 0006
11 H0 : Type of ticket purchased is independent of the type of flight
Ha: Type of ticket purchased is not independent of the type of flight
Trang 8Business Domestic 95 150.73 20.61
Degrees of freedom = (3-1)(2-1) = 2
Using the 2
table with df = 2, 2
= 100.43 shows the p-value is less than 005.
Using Excel or Minitab, the p-value corresponding to 2
= 100.43 is 0000
p-value 05, reject H0 Conclude that the type of ticket purchased is not independent of the type of flight
H Method of payment is not independent of age group a:
Using the 2
table with df = 3, 2
= 7.95 shows the p-value is between.025 and 05
Using Excel or Minitab, the p-value corresponding to 2
= 7.95 is 0471
Trang 9b The estimated probability of using plastic by age group:
Age Group Probability of Using Plastic
c Companies such as Visa, MasterCard and Discovery want their cards in the hands of consumers with a high probability of using plastic to make a purchase Thus, while these companies will want
to target all age groups, they should definitely consider specific strategies targeted as getting cards into the hands of the higher use 18 to 24 year old consumers
13 a Observed Frequencies
Health InsuranceSize of
Using the 2 table with df = 2,2= 6.94 shows the p-value is between 025 and 05
Using Excel or Minitab, the p-value corresponding to 2
= 6.94 is 0311
Trang 10p-value 05, reject H0 Health insurance coverage is not independent of the size of the
2
= 8.47 Degrees of freedom = (2 - 1)(4 - 1) = 3
Using the 2
table with df = 3, 2
= 8.47 shows the p-value is between 025 and 05
Using Excel or Minitab, the p-value corresponding to 2
= 8.47 is 0372
Trang 11Note: Consumer Reports does not report fractional or decimal value overall satisfaction scores.
Ranked in order of overall satisfaction:
Honda Accord, Toyota Camry, Ford Taurus, and Chevrolet Impala
c The United States models (Impala and Taurus) have satisfaction scores less than the overall satisfaction score for the class, while the Japanese models have satisfaction scores greater than the overall satisfaction score for the class The satisfaction score is a measure of the likelihood the owner will purchase the model again There is evidence of greater brand loyalty among the Japanese models The market share for the United States models may well decline in the future due to the lower owner satisfaction
H Flying during the snowstorm is not independent airline a:
Using the 2 table with df = 3,2= 8.25 shows the p-value is between.025 and 05
Trang 12Using Excel or Minitab, the p-value corresponding to 2
= 8.25 is 0411
independent of the airline During this particular storm, the sample data show the following
percent of scheduled flights flown: American (48%), Continental (62.7%), Delta (52.3%) and United (41.7%)
Which airline you would choose to fly during similar snowstorm conditions can have different answers for different people Taking the position that we agree that airlines operate within set safety parameters and fly only if it is safe, we prefer an airline that does the best job of keeping its flights operational during a snowstorm In this case, Continental and then Delta would be
preferred A very conservative passenger might prefer otherwise, perhaps favoring an airline that flies less and keeps more of its planes on the ground during a snowstorm
16 a The sample size is very large: 6448
b Observed Frequency (fij)
The p-value is approximately 0.
Trang 13Italy shows the most support for nuclear power plants with 58% in favor Spain shows the least support with only 32% in favor Only Italy and the United States show more than 50% of the respondents in favor of building new nuclear power plants.
Using the 2
table with df = 3, 2
= 4.01 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
Trang 14Using the 2 table with df = 2,2= 3.01 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
= 3.01 is 2220
p-value > 05, do not reject H0 Married couples working is independent of location The overall percentage of married couples with both husband and wife working is 190/300 = 63.3%
19 Expected Frequencies:
Trang 15 = 45.36 shows the p-value is less than 005.
Using Excel or Minitab, the p-value corresponding to 2
= 45.36 is 0000
p-value 01, reject H0 Conclude that the ratings are not independent
20 First estimate from the sample data Sample size = 120
0(39) 1(30) 2(30) 3(18) 4(3) 156
1.3
Therefore, we use Poisson probabilities with = 1.3 to compute expected frequencies
x FrequencyObserved ProbabilityPoisson FrequencyExpected Difference(fi - ei)
= 9.04 shows the p-value is between 025 and 05
Using Excel or Minitab, the p-value corresponding to 2
Trang 16The z values that create 6 intervals, each with probability 1667 are -.98, -.43, 0, 43, 98
-.98 22.8 - 98 (6.27) = 16.66-.43 22.8 - 43 (6.27) = 20.11
.43 22.8 + 43 (6.27) = 25.49.98 22.8 + 98 (6.27) = 28.94
Interval FrequencyObserved FrequencyExpected Difference
Using the 2 table with df = 3,2= 3.20 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
Use Poisson probabilities with = 1
x Observed ProbabilitiesPoisson Expected
Trang 17Using the 2 table with df = 2,2= 4.30 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
= 4.95 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
Trang 18Using the 2 table with df = 3,2= 2.80 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
Interval FrequencyObserved FrequencyExpected
= 11.20 shows the p-value is less than 005
Using Excel or Minitab, the p-value corresponding to 2
Trang 19Using the 2 table with df = 2,2= 41.69 shows the p-value is less than 005
Using Excel or Minitab, the p-value corresponding to 2
Using the 2 table with df = 2,2= 4.64 shows the p-value is between 05 and 10
Using Excel or Minitab, the p-value corresponding to 2
= 4.64 is 0983
40% However, the sample does not justify the conclusion that the market shares have changed from their historical 37%, 34%, 29% levels
All three manufacturers will want to watch for additional sales reports before drawing a final conclusion
= 7.44 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
= 7.44 is 1144
Trang 20p-value > 05, do not reject H0 The assumption that the number of riders is uniformly distributed cannot be rejected.
= 42.53 shows the p-value is less than 005.
Using Excel or Minitab, the p-value corresponding to 2
= 8.10 shows the p-value is between 01 and 025
Using Excel or Minitab, the p-value corresponding to 2
Trang 21Observed ExpectedFrequency Frequency
Degrees of freedom = (4 - 1)(2 - 1) = 3
Using the 2 table with df = 3,2= 23.37 shows the p-value is less than 005
Using Excel or Minitab, the p-value corresponding to 2
= 23.37 is 0000
33
Expected frequencies:
Loan Approval Decision
= 1.21 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2= 2.21 is 5300
p-value > 05, do not reject H0 The loan decision does not appear to be dependent on the
officer.
34 a Column totals: Slower 213, No Preference 21, and Faster 66
Percentage preferring a slower pace = (213/300)(100) = 71%
Percentage preferring a faster pace = (66/300)(100) = 22%
The combined samples of men and women show a majority would rather live in a place with a slower pace of life
b Observed Frequency (fij)
Trang 22Preferred Pace of LifeRespondent Slower No Pref Faster Total
Expected Frequency (eij)
Preferred Pace of LifeRespondent Slower No Pref Faster Total
Chi Square (fij - eij)2/ eij
Preferred Pace of Life
χ2 = 2.99Degrees of freedom = (2-1)(3-1) = 2
Using the 2
table with df = 2, 2
= 2.99 shows the p-value is greater than 10.
Using Excel or Minitab, the p-value corresponding to 2
= 2.99 is 2242
p-value > 05, do not reject H0 We cannot reject the assumption that the preferred pace of life is independent of the respondent being a man or a woman That is, there is no statistical evidence to conclude men and women differ with respected to the preferred pace of life
This is a good example of where it would be desirable to study this further before drawing a conclusion Including a larger number of men and women in the sample and repeating the analysis should be considered
35 Observed Frequencies
ChurchAttendance
Trang 23Total 260 340 600
Expected Frequencies
ChurchAttendance
Using the 2
table with df = 3, 2
= 8.73 shows the p-value is between 025 and 05
Using Excel or Minitab, the p-value corresponding to 2
= 8.73 is 0331
Attendance by age group:
Trang 2436 Expected Frequencies:
Days of the Week
Using the 2 table with df = 6,2= 6.17 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
ExpectedFrequency
Using the 2 table with df = 5,2= 2.00 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2
Trang 25 = 7.75 shows the p-value between 05 and 10.
Using Excel or Minitab, the p-value corresponding to 2= 7.75 is 0515
p-value > 05, do not reject H0 We cannot conclude that office vacancies are dependent on
metropolitan area, but it is close: the p-value is slightly larger than 05.
The expected frequency of x = 4 is 81 Combine x = 3 and x = 4 into one category so that all
expected frequencies are 5 or more
x FrequenciesObserved FrequenciesExpected
= 6.17 shows the p-value is greater than 10
Using Excel or Minitab, the p-value corresponding to 2= 6.17 is 1036
p-value > 05, do not reject H0 Conclude that the assumption of a binomial distribution cannot be rejected