1. Trang chủ
  2. » Kinh Doanh - Tiếp Thị

Quantitative Methods for Business chapter 16 doc

57 328 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Test driving – sampling theory, estimation and hypothesis testing
Trường học Unknown University
Chuyên ngành Quantitative Methods for Business
Thể loại Chương
Định dạng
Số trang 57
Dung lượng 272,5 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Test driving – sampling theory, estimation andhypothesis testing 16 Chapter objectives This chapter will help you to: ■ understand the theory behind the use of sample results forpredicti

Trang 1

Test driving – sampling theory, estimation and

hypothesis testing

16

Chapter objectives

This chapter will help you to:

■ understand the theory behind the use of sample results forprediction

make use of the t distribution and appreciate its importance

■ construct and interpret interval estimates of populationmeans and population proportions

■ work out necessary sample sizes for interval estimation

■ carry out tests of hypotheses about population means, portions and medians, and draw appropriate conclusionsfrom them

pro-■ use the technology; the t distribution, estimation and esis testing in EXCEL, MINITAB and SPSS

hypoth-■ become acquainted with the business origins of the t

distribution

In the previous chapter we reviewed the methods that can be used toselect samples from populations in order to gain some understanding ofthose populations In this chapter we will consider how sample resultscan be used to provide estimates of key features, or parameters, of the

Trang 2

populations from which they were selected It is important to note thatthe techniques described in this chapter, and the theory on which theyare based, should only be used with results of samples selected usingprobabilistic or random sampling methods The techniques are based

on knowing, or at least having a reliable estimate of, the sampling errorand this is not possible with non-random sampling methods

In Chapter 13 we looked at the normal distribution, an importantstatistical distribution that enables you to investigate the very manycontinuous variables that occur in business and many other fields,whose values are distributed in a normal pattern What makes the nor-mal distribution especially important is that it enables us to anticipate

how sample results vary This is because many sampling distributions

have a normal pattern

16.1 Sampling distributions

Sampling distributions are distributions that show how sample resultsvary They depict the ‘populations’ of sample results Such distribu-tions play a crucial role in quantitative work because they enable us touse data from a sample to make statistically valid predictions or judge-ments about a population There are considerable advantages in usingsample results in this way, especially when the population is too large

to be accessible, or if investigating the population is too expensive ortime-consuming

A sample is a subset of a population, that is, it consists of some vations taken from the population A random sample is a sample thatconsists of values taken at random from the population

obser-You can take many different random samples from the same tion, even samples that consist of the same number of observations.Unless the population is very small the number of samples that you couldtake from it is to all intents and purposes infinite A ‘parent’ populationcan produce an effectively infinite number of ‘offspring’ samples.These samples will have different means, standard deviations and so

popula-on So if we want to use, say, a sample mean to predict the value of thepopulation mean we will be using something that varies from sample to

sample, the sample mean (x –), to predict something that is fixed, thepopulation mean

To do this successfully we need to know how the sample means varyfrom one sample to another We need to think of sample means as

observations, x – s, of a variable, X––, and consider how they are distributed.What is more we need to relate the distribution of sample means to the

Trang 3

parameters of the population the samples come from so that we canuse sample statistics to predict population measures The distribution

of X––, the sample means, is a sampling distribution.

We will begin by considering the simplest case, in which we assumethat the parent population is normally distributed If this is the case,what will the sampling distributions of means of samples taken fromthe population be like?

If you were to take all possible random samples consisting of n

obser-vations from a population that is normal, with a mean ␮ and a standard

deviation␴, and analyse them you would find that the sample means of

all these samples will themselves be normally distributed

You would find that the mean of the distribution of all these ent sample means is exactly the same as the population mean, ␮ You

differ-would also find that the standard deviation of all these sample means isthe population standard deviation divided by the square root of thesize of the samples, ␴/√n.

So the sample means of all the samples size n that can be taken from

a normal population with a mean ␮ and a standard deviation ␴ are

dis-tributed normally with a mean of ␮ and a standard deviation of ␴/√n.

In other words, the sample means are distributed around the samemean as the population itself but with a smaller spread

We know that the sample means will be less spread out than the

popu-lation because n will be more than one, so ␴/√n will be less than ␴ For

instance, if there are four observations in each sample, ␴/√n will be

␴/2, that is the sampling distribution of means of samples which have

four observations in them will have half the spread of the populationdistribution

The larger the size of the samples, the less the spread in the values oftheir means, for instance if each sample consists of 100 observations thestandard deviation of the sampling distribution will be ␴/10, a tenth

of the population distribution

This is an important logical point In taking samples we are aging out’ the differences between the individual values in the popula-tion The larger the samples, the more this happens For this reason it

‘aver-is better to use larger samples to make predictions about a population.Next time you see an opinion poll look for the number of peoplethat the pollsters have canvassed It will probably be at least one thou-sand The results of an opinion poll are a product that the pollingorganization wants to sell to media companies In order to do this theyhave to persuade them that their poll results are likely to be reliable.They won’t be able to do this if they only ask a very few people for theiropinions!

Trang 4

The standard deviation of a sampling distribution, ␴/√n, is also

known as the standard error of the sampling distribution because it

helps us anticipate the error we will have to deal with if we use a ple mean to predict the value of the population mean

sam-If we know the mean and standard deviation of the parent populationdistribution we can find the probabilities of ranges different samplemeans as we can do for any other normal distribution, by using theStandard Normal Distribution

Example 16.1

Reebar Frozen Foods produces packs of four fish portions On the packaging they claimthat the average weight of the portions is 120 g If the mean weight of the fish portionsthey buy is 124 g with a standard deviation of 4 g, what is the probability that the meanweight of a pack of four portions will be less than 120 g?

We will assume that the selection of the four portions to put in a pack is random.Imagine we took every possible sample of four portions from the population of fish por-tions purchased by Reebar (which we will assume for practical purposes to be infinite)and calculated the mean weight of each sample We would find that the sampling distri-bution of all these means has a mean of 124 g and a standard error of 4/√4, which is 2.The probability that a sample of four portions has a mean of less than 120 g is theprobability that a normal variable with a mean of 124 and a standard deviation of 2 isless than 120

The z-equivalent of the value 120 in the sampling distribution is

From Table 5 on pages 621–622 in Appendix 1 you will find that the probability that

z is less than2.00 is 0.0228 or 2.28%

We can conclude that there is a less than one in forty chance that four portions in apack chosen at random have a mean weight of less than 120 g You might like tocompare this with the probability of one fish portion selected at random weighing lessthan 120 g:

Using Table 5 you will find that the probability that Z is less than1.00 is 0.1587 or15.87%, approximately a one in six chance This is rather greater than the chance ofgetting a pack of four whose mean weight is less than 120 g (2.28%); in general there isless variation among sample means than there is among single points of data

Trang 5

The procedure we used in Example 16.1 can be applied whether weare dealing with small samples or with very much larger samples Aslong as the population the samples come from is normal we can besure that the sampling distribution will be distributed normally with amean of ␮ and a standard deviation of ␴/√n.

But what if the population is not normal? There are many tions that are not normal, such as distributions of wealth of individuals

distribu-or distributions of waiting times

Fortunately, according to a mathematical finding known as the

Central Limit Theorem, as long as n is large (which is usually interpreted

to mean 30 or more) the sampling distribution of sample means will benormal in shape and have a mean of ␮ and a standard deviation of ␴/√n.

This is true whatever the shape of the population distribution.

Example 16.2

The times that passengers at a busy railway station have to wait to buy tickets during therush hour follow a skewed distribution with a mean of 2 minutes 46 seconds and a stand-ard deviation of 1 minute 20 seconds What is the probability that a random sample of

100 passengers will, on average, have to wait more than 3 minutes?

The sample size, 100, is larger than 30 so the sampling distribution of the samplemeans will have a normal shape It will have a mean of 2 minutes 46 seconds, or

166 seconds, and a standard error of 80/√100 seconds

From Table 5 the probability that Z is more than 1.75 is 0.0401 So the probability

that a random sample of 100 passengers will have to wait on average more than 3 minutes

is 4.01%, or a little more than a one in twenty-five chance

At this point you may find it useful to try Review Questions 16.1 to 16.6at the end of the chapter

Trang 6

16.1.1 Estimating the standard error

The main reason for being interested in sampling distributions is tohelp us use samples to assess populations because studying the wholepopulation is not possible or practicable Typically we will be using asample, which we do know about, to investigate a population, which wedon’t know about We will have a sample mean and we will want to use

it to assess the likely value of the population mean

So far we have measured sampling distributions using the mean andthe standard deviation of the population, ␮ and ␴ But if we need to

find out about the population using a sample, how can we possiblyknow the values of ␮ and ␴?

The answer is that in practice we don’t In the case of the populationmean,␮, this doesn’t matter because typically it is something we are

trying to assess But without the population standard deviation, ␴, we

do need an alternative approach to measuring the spread of a pling distribution

sam-Because we will have a sample, the obvious answer is to use the

stand-ard deviation, s, in place of the population standstand-ard deviation, ␴ So

instead of using the real standard error, ␴/√n, we estimate the

stand-ard error of the sampling distribution with s/√n.

Using the estimated standard error, s/√n, is fine as long as the ple concerned is large (in practice, that n, the sample size, is at least 30) If we are dealing with a large sample we can use s/√n as an approx-

sam-imation of ␴/√n The means of samples consisting of n observations

will be normally distributed with a mean of ␮ and an estimated

stan-dard error of s/√n The Central Limit Theorem allows us to do this

even if the population the sample comes from is not itself normal inshape

The population mean, ␮, in this case is 0.538 and the sample standard deviation,

s, is 0.042 We want the probability that x – is less than 0.568, P (X–– 0.568) The

Trang 7

It is important to remember that s/√n is not the real standard error,

it is the estimated standard error, but because the standard deviation of a

large sample will be reasonably close to the population standard ation the estimated standard error will be close to the actual standarderror

devi-At this point you may find it useful to try Review Question 16.7 at the

end of the chapter

16.1.2 The t distribution

In section 16.1.1 we looked at how you can analyse sampling

distribu-tions using the sample standard deviation, s, when you do not know the

population standard deviation, ␴ As long as the sample size, n, is 30

or more the estimated standard error will be a sufficiently consistentmeasure of the spread of the sampling distribution, whatever the shape

of the parent population

If, however, the sample size, n, is less than 30 the estimated standard error, s/√n, is generally not so close to the actual standard error, ␴/√n,

and the smaller the sample size, the greater will be the differencebetween the two In this situation it is possible to model the samplingdistribution using the estimated standard error, as long as the popula-tion the sample comes from is normal, but we have to use a modifiednormal distribution in order to do it

This modified normal distribution is known as the t distribution The

development of the distribution was a real breakthrough because itmade it possible to investigate populations using small sample results.Small samples are generally much cheaper and quicker to gather than

a large sample so the t distribution broadened the scope for analysis

based on sample data

z-equivalent of 0.568 is:

If you look at Table 5 you will find that the probability that Z is less than 2.73 is 0.9968,

so the probability that the sample mean is more than a pint is 0.9968 or 99.68%

Trang 8

The t distribution is a more spread out version of the normal

distri-bution The difference between the two is illustrated in Figure 16.1.The greater spread is to compensate for the greater variation in samplestandard deviations between small samples than between large samples.The smaller the sample size, the more compensation is needed, so

there are a number of versions of the t distribution The one that

should be used in a particular context depends on the number ofdegrees of freedom, represented by the symbol ␯ (nu, the Greek letter n),

which is the sample size minus one, n1

To work out the probability that the mean of a small sample takenfrom a normal population is more, or less, than a certain amount we

first need to find its t-equivalent, or t value The procedure is very similar to the way we work out a z-equivalent.

Trang 9

The t value that we used in Example 16.4 could be written as t0.05,8because it is the value of t that cuts off a tail area of 5% in the t distri- bution that has 8 degrees of freedom In the same way, t0.01,15represent

the t value that cuts off a tail area of 1% in the t distribution that has

15 degrees of freedom

You will find that the way the t distribution is used in further work depends on tail areas For this reason, and also because the t distribution

varies depending on the number of degrees of freedom, printed tables

do not provide full details of the t distribution in the same way that

Standard Normal Distribution tables give full details of the Standard

Normal Distribution Table 6 on page 623 gives selected values of t from the t distribution with different degrees of freedom for the most com- monly used tail areas If you need t distribution values that are not in

Table 6 you can obtain them using computer software, as shown in section 16.4 later in this chapter

The population mean, ␮, is 0.538 and the sample standard deviation, s, is 0.048 We

want the probability that X–– is less than 0.568, P (X–– 0.568) The t value equivalent

to 0.568 is:

You will find some details of the t distribution in Table 6 on page 623 in Appendix 1.

Look down the column headed ␯ on the left hand side until you see the figure 8, the

number of degrees of freedom in this case (the sample size is 9) Look across the row to

the right and you will see five figures that relate to the t distribution with eight degrees

of freedom The nearest of these figures to 1.875 is the 1.86 that is in the column

headed 0.05 This means that 5% of the t distribution with eight degrees of freedom is above 1.86 In other words, the probability that t is more than 1.86 is 0.05 This means

that the probability that the mean volume of nine pints will be less than 0.568 litres will

Use Table 6 to find:

(a) t with 4 degrees of freedom that cuts off a tail area of 0.10, t0.10,4

(b t with 10 degrees of freedom that cuts off a tail area of 0.01, t0.01,10

Trang 10

At this point you may find it useful to try Review Questions 16.8 and 16.9at the end of the chapter.

16.1.3 Choosing the right model for a sampling

distribution

The normal distribution and the t distribution are both models that

you can use to model sampling distributions, but how can you be surethat you use the correct one? This section is intended to provide a briefguide to making the choice

The first question to ask is, are the samples whose results make upthe sampling distribution drawn from a population that is distributednormally? In other words, is the parent population normal? If theanswer is yes then it is always possible to model the sampling distribu-tion If the answer is no then it is only possible to model the sampling

distribution if the sample size, n, is 30 or more.

The second question is whether the population standard deviation,

␴, is known If the answer to this is yes then as long as the parent

popu-lation is normal the sampling distribution can be modelled using thenormal distribution whatever the sample size If the answer is no thesampling distribution can be modelled using the normal distributiononly if the sample size is 30 or more In the absence of the populationstandard deviation, you have to use the sample standard deviation toapproximate the standard error

Finally, what if the parent population is normal, the population ard deviation is not known and the sample size is less than 30? In

stand-these circumstances you should use the t distribution and approximate

(c) t with 17 degrees of freedom that cuts off a tail area of 0.025, t0.025,17

(d) t with 100 degrees of freedom that cuts off a tail area of 0.005, t0.005,100

From Table 6:

(a) t0.10,4is in the row for 4 degrees of freedom and the column headed 0.10,

1.533 This means that the probability that t, with 4 degrees of freedom, is

greater than 1.533 is 0.10 or 10%

(b) t0.01,10is the figure in the row for 10 degrees of freedom and the columnheaded 0.01, 2.764

(c) t0.025,17is in the row for 17 degrees of freedom and the 0.025 column, 2.110

(d) t0.005,100is in the row for 100 degrees of freedom and the 0.005 column, 2.626

Trang 11

the standard error using the sample standard deviation Note that if

the parent population is not normal and the sample size is less than 30 neither the normal distribution nor the t distribution can be used to

model the sampling distribution, and this is true whether or not thepopulation standard deviation is known

16.2 Statistical inference: estimation

Businesses use statistical analysis to help them study and solve lems In many cases the data they use in their analysis will be sampledata Usually it is too expensive, or too time-consuming or simplyimpossible to obtain population data

prob-So if there is a problem of customer dissatisfaction they will study datafrom a sample of customers, not all customers If there is a problemwith product quality they will study a sample of the products, not all ofthem If a large organization has a problem with staff training they willstudy the experiences of a sample of their staff rather than all their staff

Of course, they will want to analyse the sample data in order to drawgeneral conclusions about the population As long as the samples theyuse are random samples, in other words they consist of observed valueschosen at random from the population, it is quite possible to do this.The use of sample data in drawing conclusions, or making deduc-

tions, about populations is known as statistical inference from the word

infer which means to deduce or conclude Statistical inference that

involves testing claims about population parameters is known as

statis-tical decision-making because it can be used to help organizations and

individuals take decisions

In the last section we looked at sampling distributions These butions are the theoretical foundations on which statistical inference isbuilt because they connect the behaviour of sample results to the dis-tribution of the population the samples came from Now we will con-sider the procedures involved in statistical inference

distri-There are two types of statistical inference technique that you willencounter in this chapter The one we shall look at in this section is

estimation, the using of sample data to predict population measures

like means and proportions The other is hypothesis testing, using sample

data to verify or refute claims made about population measures, thesubject of Section 16.3

Collecting sample data can be time-consuming and expensive so inpractice organizations don’t like to gather more data than they need,but on the other hand they don’t want to end up with too little in casethey haven’t enough for the sort of conclusive results they want You

Trang 12

will find a discussion of this aspect of planning statistical research inthis section.

16.2.1 Statistical estimation

Statistical estimation is the use of sample measures such as means or portions to estimate the values of their population counterparts The easi-est way of doing this is to simply take the sample measure and use it as

pro-it stands as a prediction of the population equivalent So, we could take the mean of a sample and use it as our estimate of the population mean

This type of prediction is called point estimation It is used to get a ‘feel’

for the population value and is a perfectly valid use of the sample result.The main shortcoming of point estimation is given away by its name;

it is a single point, a single shot at estimating one number using another

It is a crude way of estimating a population measure because not only

is it uncertain whether it will be a good estimate, in other words close tothe measure we want it to estimate, but we have no idea of the probabilitythat it is a good estimate

The best way of using sample information to predict population

measures is to use what is known as interval estimation, which involves constructing a range or interval as the estimate The aim is to be able to

say how likely it is that the interval we construct is accurate, in other

words, how confident we are that the interval does include within it the

population measure Because the probability that the interval includesthe population measure, or the confidence we should have in theinterval estimate, is an important issue, interval estimates are often

called confidence intervals.

Before we look at how interval estimates are constructed and whythey work, it will be helpful if we reiterate some key points about sam-pling distributions For convenience we will concentrate on samplemeans for the time being

■ A sampling distribution of sample means shows how all

the means of the different sample of a particular size, n, are

distributed

■ Sampling distributions that describe the behaviour of means

of samples of 30 or more are always approximately normal

■ The mean of the sampling distribution of sample means is thepopulation mean, ␮.

■ The standard deviation of the sampling distribution of samplemeans, called the standard error, is the population standarddeviation divided by the square root of the sample size, ␴/√n.

Trang 13

The sampling distributions that are normal in shape, the ones thatshow how sample means of big samples vary, have the features of thenormal distribution One of these features is that if we take a point twostandard deviations to the left of the mean and another point two stand-ard deviations to the right of the mean, the area between the twopoints is roughly 95% of the distribution.

To be more precise, if these points were 1.96 standard deviationsbelow and above the mean of the distribution the area would be exactly95% of the distribution In other words, 95% of the observations in thedistribution are within 1.96 standard deviations from the mean

This is also true for normal sampling distributions; 95% of the ple means in a sampling distribution that is normal will be between1.96 standard errors below and 1.96 standard errors above the mean.You can see this illustrated in Figure 16.2

sam-The limits of this range or interval are:

␮  1.96 ␴/√n on the left-hand side

and ␮  1.96 ␴/√n on the right-hand side.

The greatest difference between any of the middle 95% of samplemeans and the population mean, ␮, is 1.96 standard errors, 1.96 ␴√n.

The probability that any one sample mean is within 1.96 standarderrors of the mean is:

The sampling distribution allows us to predict values of sample meansusing the population mean But in practice we wouldn’t be interested in

Trang 14

doing this because we don’t know the population mean Indeed, ically the population mean is the thing we want to find out about using

typ-a styp-ample metyp-an rtyp-ather thtyp-an the other wtyp-ay round Whtyp-at mtyp-akes styp-amplingdistributions so important is that we can use them to do this

As we have seen, adding and subtracting 1.96 standard errors to andfrom the population mean creates an interval that contains 95% of thesample means in the distribution But what if, instead of adding thisamount to and subtracting it from the population mean, we add it toand subtract it from every sample mean in the distribution?

We would create an interval around every sample mean In 95% ofcases, the intervals based on the 95% of sample means closest to thepopulation mean in the middle of the distribution, the interval wouldcontain the population mean itself In the other 5% of cases, thosemeans furthest away from the population mean, the interval would notcontain the population mean

So, suppose we take the mean of a large sample and create a rangearound it by adding 1.96 standard errors to get an upper figure, andsubtracting 1.96 standard errors to get a lower figure There is a 95%chance that the range between the upper and lower figures will encom-

pass the mean of the population Such a range is called a 95% interval

estimate or a 95% confidence interval because it is an interval that we are

95% confident, or certain, contains the population mean

Example 16.6

The total bill sizes of shoppers at a supermarket have a mean of £50 and a standard ation of £12.75 A group of researchers, who do not know that the population mean billsize is £50, finds the bill size of a random sample of 100 shoppers

devi-The sampling distribution that the mean of their sample belongs to is shown inFigure 16.3 The standard error of this distribution is 12.75/√100 1.275

Ninety-five per cent of the sample means in this distribution will be between 1.96standard errors below the mean, which is:

50 (1.96 * 1.275)  47.50and 1.96 standard errors above the mean, which is:

50 (1.96 * 1.275)  52.50

This is shown in Figure 16.4

Suppose the researchers calculate the mean of their sample and it is £49.25, a ure inside the interval 47.50 to 52.50 that contains the 95% of sample means within

Trang 15

fig-1.96 standard errors of the population mean If they add and subtract the same fig-1.96standard errors to and from their sample mean:

49.25 (1.96 * 1.275)  49.25  2.499  £46.751 to £51.749The interval they create does contain the population mean, £50

Notice the symbol ‘’ in the expression we have used It represents the carrying out

of two operations: both adding and subtracting the amount after it The addition

46.175 47.45 48.725 50.00 51.275 52.55 53.825 0.0

0.1 0.2 0.3 0.4

0.1 0.2 0.3 0.4

Trang 16

If the researchers in Example 16.6 took many samples and created

an interval based on each one by adding and subtracting 1.96 standarderrors they would find that only occasionally would the interval notinclude the population mean

How often will the researchers in Example 16.6 produce an intervalthat does not include the population mean? The answer is every timethey have a sample mean that is among the lowest 21⁄2% or the highest

21⁄2% of sample means If the sample mean is among the lowest 21⁄2%the interval they produce will be too low, as in Example 16.7 If thesample mean is among the highest 21⁄2% the interval will be too high

As long as the sample mean is among the 95% of the distributionbetween the lowest 21⁄2% and the highest 21⁄2%, the interval they pro-duce will include the population mean, in other words it will be anaccurate estimate of the population mean

Of course, usually when we carry out this sort of research we don’tactually know what the population mean is, so we don’t know whetherthe sample mean we have is among the 95% that will give us accurateinterval estimates or whether it is among the 5% that will give us inac-curate interval estimates The important point is that if we have a sam-ple mean and we create an interval in this way there is a 95% chancethat the interval will be accurate To put it another way, on average

produces the higher figure, in this case 51.749, and the subtraction produces the lowerfigure, 46.751

Imagine they take another random sample of 100 shoppers and find that the meanexpenditure of this second sample is a little higher, but still within the central 95% ofthe sampling distribution, say £51.87 If they add and subtract 1.96 standard errors toand from this second mean:

51.87 (1.96 * 1.275)  51.87  2.499  £49.371 to £54.369This interval also includes the population mean

Example 16.7

The researchers in Example 16.6 take a random sample that yields a mean of £47.13.Calculate a 95% confidence interval using this sample mean

47.13 (1.96 * 1.275)  47.13  2.499  £44.631 to £49.629This interval does not include the population mean of £50

Trang 17

19 out of every 20 samples will produce an accurate estimate, and

1 out of 20 will not That is why the interval is called a 95% interval estimate or a 95% confidence interval

We can express the procedure for finding an interval estimate for apopulation measure as taking a sample result and adding and sub-

tracting an error This error reflects the uncertainties involved in using

sample information to predict population measures

Population measure estimate sample result  errorThe error is made up of two parts, the standard error and the number

of standard errors The number of standard errors depends on howconfident we want to be in our estimation

Suppose you want to estimate the population mean If you know thepopulation standard deviation, ␴, and you want to be (100  ␣)% con-

fident that your interval is accurate, then you can obtain your estimate

of␮ using:

The letter ‘z’ appears because we are dealing with sampling

distribu-tions that are normal, so we can use the Standard Normal Distribution,

the z distribution, to model them You have to choose which z value to

use on the basis of how sure you want or need to be that your estimate

is accurate

If you want to be 95% confident in your estimate, that is (100 ␣)%  95%, then ␣ is 5% and ␣/2 is 21⁄2% or 0.025 To produce

your estimate you would use z0.025, 1.96, the z value that cuts off a

21⁄2% tail in the Standard Normal Distribution This means that a point1.96 standard errors away from the mean of a normal sampling distri-bution, the population mean, will cut off a tail area of 21⁄2% of the dis-tribution So:

95% interval estimate of ␮  x –  (1.96 * ␴/√n)

This is the procedure we used in Example 16.6

The most commonly used level of confidence interval is probably95%, but what if you wanted to construct an interval based on a higherlevel of confidence, say 99%? A 99% level of confidence means we want99% of the sample means in the sampling distribution to provide accur-ate interval estimates

To obtain a 99% confidence interval the only adjustment we make is

the z value that we use If (100  ␣)%  99%, then ␣ is 1% and ␣/2 is

1⁄2% or 0.005 To produce your estimate use z0.005, 2.576, the z value

x (z ␣2*n)

Trang 18

that cuts off a 1⁄2% tail in the Standard Normal Distribution:

99% interval estimate of ␮  x –  (2.576 * ␴/√n)

The most commonly used confidence levels and the z values you

need to construct them are given in Table 16.1

Notice that the confidence interval in Example 16.8 includes thepopulation mean, £50, unlike the 95% interval estimate produced inExample 16.7 using the same sample mean, £47.13 This is because thissample mean, £47.13, is not amongst the 95% closest to the populationmean, but it is amongst the 99% closest to the population mean.Changing the level of confidence to 99% has meant the interval isaccurate, but it is also wider The 95% interval estimate was £44.631 to

£49.629, a width of £4.998 The 99% interval estimate is £43.846 to

£50.414, a width of £6.568

You can obtain the z values necessary for other levels of confidence by

looking for the appropriate values of ␣/2 in the body of Table 5 on pages

621–622 in Appendix 1 and finding the z values associated with them.

From Table 16.1 the z value that cuts off a 0.005 tail area is 2.576, so the 99%

confi-dence interval is:

47.13 (2.576 * 1.275)  47.13  3.284  £43.846 to £50.414

Example 16.9

Use the sample result in Example 16.7, £47.13, to produce a 98% confidence intervalfor the population mean

Trang 19

At this point you may find it useful to try Review Questions 16.10 and 16.11at the end of the chapter.

16.2.2 Determining the sample size for estimating

a population mean

All other things being equal, if we want to be more confident that ourinterval is accurate we have to accept that the interval will be wider, inother words less precise If we want to be more confident and retainthe same degree of precision, the only thing we can do is to take alarger sample

In the examples we have looked at so far the size of the sample wasalready decided But what if, before starting a sample investigation, youwanted to ensure that you had a big enough sample to enable you toproduce a precise enough estimate with a certain level of confidence?

To see how, we need to start with the expression we have used for theerror of a confidence interval:

Until now we have assumed that we know these three elements so wecan work out the error But what if we wanted to set the error and find

the necessary sample size, n? We can change the expression for the

error around so that it provides a definition of the sample size:

This means that as long as you know the degree of precision you

need (the error), the level of confidence (to find z ␣/2), and the lation standard deviation (␴), you can find out what sample size you

Trang 20

At this point you may find it useful to try Review Questions 16.12 and 16.13at the end of the chapter.

Practical interval estimation is based on sample results alone, but

it is very similar to the procedure we explored in Example 16.6 The

main difference is that we have to use a sample standard deviation, s, to

produce an estimate for the standard error of the sampling distributionthe sample belongs to Apart from this, as long as the sample we have isquite large, which we can define as containing 30 or more observations,

we can follow exactly the same procedure as before

That is, instead of

estimate of ␮  x –  (z ␣/2*␴/√n)

we use

estimate of ␮  x –  (z ␣/2 * s/√n).

Example 16.10

If the researchers in Example 16.6 want to construct 99% confidence intervals that are

£5 wide, what sample size should they use?

If the estimates are to be £5 wide that means they will have to be produced by addingand subtracting £2.50 to and from the sample mean In other words the error will be2.50 If the level of confidence is to be 99% then the error will be 2.576 standard errors

We know that the population standard deviation, ␴, is 12.75, so:

n (2.576 * 12.75/2.50)2

n (13.1376)2 172.6 to one decimal place

Since the size of a sample must be a whole number we should round this up to 173.When you are working out the necessary sample size you must round the calculatedsample size up to the next whole number to achieve the specified confidence level.Here we would round up to 173 even if the result of the calculation was 172.01

We should conclude that if the researchers want 99% interval estimates that are £5wide they would have to take a sample of 173 shoppers

Trang 21

In Example 16.11 we are not told whether the population that thesample comes from is normal or not This doesn’t matter because thesample size is over 30 In fact, given that airlines tend to restrict cabinbaggage to 5 kg per passenger the population distribution in this casewould probably be skewed.

16.2.4 Estimating with small samples

If we want to produce an interval estimate based on a smaller sample,one with less than 30 observations in it, we have to be much more care-ful First, for the procedures we will consider in this section to be valid,the population that the sample comes from must be normal Second,because the sample standard deviation of a small sample is not a reli-able enough estimate of the population standard deviation to enable

us to use the z distribution, we must use the t distribution to find how

many standard errors are to be added and subtracted to produce aninterval estimate with a given level of confidence

Instead of

estimate of ␮  x  (z␣/2*␴/√n)

we use

estimate of ␮  x  (t ␣/2,␯ * s/√n).

The form of the t distribution you use depends on ␯, the number of

degrees of freedom, which is the sample size less one (n 1) You canfind the values you need to produce interval estimates in Table 6 onpage 623 of Appendix 1

Example 16.11

The mean weight of the cabin baggage checked in by a random sample of 40 passengers

at an international airport departure terminal was 3.47 kg The sample standard ation was 0.82 kg Construct a 90% confidence interval for the mean weight of cabin bag-gage checked in by passengers at the terminal

devi-In this case ␣ is 10%, so ␣/2 is 5% or 0.05 and according to Table 16.1 z0.05is 1.645

90% interval estimate of (1.645 * )

3.47 (1.645 * 0.82 40) 3.47 0.213

Trang 22

You may recall from section 16.1.2 that the t distribution is a fied form of the z distribution If you compare the figures in the bot- tom row of the 0.05, 0.025 and 0.005 columns of Table 6 with the z

modi-values in Table 16.1, that is 1.645, 1.960 and 2.576, you can see that

they are same If, however, you compare these z values with the lent t values in the top row of Table 6, the ones for the t distribution

equiva-with just one degree of freedom, which we would have to use for ples of only 2, you can see that the differences are substantial

sam-At this point you may find it useful to try Review Questions 16.14 and 16.15at the end of the chapter

16.2.5 Estimating population proportions

Although so far we have concentrated on how to estimate populationmeans, these are not the only population parameters that can be esti-mated You will also come across estimates of population proportions,indeed almost certainly you already have

If you have seen an opinion poll of voting intentions, you have seen

an estimate of a population proportion To produce the opinion pollresult that you read in a newspaper pollsters have interviewed a sample

of people and used the sample results to predict the voting intentions

of the entire population

dis-Here␣ is 5% so ␣/2 is 2.5% or 0.025 and the number of degrees of freedom, ␯, is

n 1, 14

95% estimate of ␮  –x  (t0.025,14* s/√n) From Table 6, t0.025,14is 2.145, so:

95% estimate of ␮  56.3  (2.145 * 7.1/√15)

 56.3  3.932  52.378% to 60.232%

Trang 23

In many ways estimating a population proportion is very similar tothe estimation we have already considered To start with you need asample: you calculate a sample result around which your estimate will

be constructed; you add and subtract an error based on the standarderror of the relevant sampling distribution, and how confident youwant to be that your estimate is accurate

We have to adjust our approach because of the different nature ofthe data When we estimate proportions we are usually dealing withqualitative variables The values of these variables are characteristics,for instance people voting for party A or party B If there are only twopossible characteristics, or we decide to use only two categories in ouranalysis, the variable will have a binomial distribution

As you will see, this is convenient as it means we only have to dealwith one sample result, the sample proportion, but it also means that

we cannot produce reliable estimates from small samples, those sisting of less than 30 observations This is because the distribution ofthe population that the sample comes from must be normal if we are

con-to use the t distribution, the device we have previously used con-to

over-come the extra uncertainty involved in small sample estimation.The sampling distribution of sample proportions is approximatelynormal in shape if the samples involved are large, that is, more than 30,

as long as the sample proportion is not too small or too large In tice, because we do not know the sample proportion before taking thesample it is best to use a sample size of over 100 If the samples are small,the sampling distribution of sample proportions is not normal

prac-Provided that you have a large sample, you can construct an intervalestimate for the population proportion, ␲ (pi, the Greek letter p), by

taking the sample proportion, p, and adding and subtracting an error.

The sample proportion is the number of items in the sample that

pos-sess the characteristic of interest, x, divided by the total number of items in the sample, n.

Sample proportion, p  x/n

The error that you add to and subtract from the sample proportion

is the z value appropriate for the level of confidence you want to use

multiplied by the estimated standard error of the sampling tion of the sample proportion This estimated standard error is based

distribu-on the sample proportidistribu-on:

estimated standard error p p

n

(1 )

Trang 24

The precision of the test depends on the estimated standard error ofthe sample proportions, √p(1 – p)/n The value of this depends on p,

the sample proportion Clearly you won’t know this until the sampledata have been collected, but you can’t collect the sample data until youhave decided what sample size to use You therefore need to make aprior assumption about the value of the sample proportion

To be on the safe side we will assume the worst-case scenario, which

is that the value of p will be the one that produces the highest value of

shop-These results suggest that we can be 95% confident that the proportion of kets with suitable trolleys for shoppers with limited mobility will be between 19.6% and36.4%

Trang 25

p (1 p) The higher the value of p (1p) the wider the interval will be, for a given sample size We need to avoid the situation where p (1p)

turns out to be larger than we have assumed it is

What is the largest value of p (1p)? If you work out p (1p) when p

is 0.1, you will get 0.09 If p is 0.2, p (1p) rises to 0.16 As you increase the value of p you will find that it keeps going up until p is 0.5, when

p (1 p) is 0.25, then it goes down again.

The error in an interval estimate of a population proportion is:

If p is 0.5, in other words we assume the largest value of p (1  p):

This last expression can be re-arranged to obtain an expression for n:

n z

For the error to be 5%:

This has to be rounded up to 385 to meet the confidence requirement so a randomsample of 385 supermarkets would have to be used

Trang 26

At this point you may find it useful to try Review Questions 16.18 and 16.19at the end of the chapter.

16.3 Statistical inference: hypothesis testing

Usually when we construct interval estimates of population parameters

we have no idea of the actual value of the parameter we are trying toestimate Indeed the purpose of estimation using sample results is totell us what the actual value is likely to be

Sometimes we use sample results to deal with a different situation.This is where the population parameter is claimed to take a particularvalue and we want to see whether the claim is correct Such a claim is

known as a hypothesis, and the use of sample results to investigate whether it is true is called hypothesis testing To begin with we will con-

centrate on testing hypotheses about population means using a singlesample Later in this section you will find hypothesis testing of popula-tion proportions and a way of testing hypotheses about populationmedians

Hypothesis testing begins with a formal statement of the claim being

made for the population parameter This is known as the null hypothesis

because it is the starting point in the investigation, and is represented

by the symbol H0, ‘aitch-nought’

We could find that a null hypothesis appears to be wrong, in which

case we should reject it in favour of an alternative hypothesis, represented

by the symbol H1, ‘aitch-one’ The alternative hypothesis is the tion of explanations that contradict the claim in the null hypothesis

collec-A null hypothesis may specify a single value for the populationmeasure, in which case we would expect the alternative hypothesis toconsist of other values both below and above it Because of this ‘dual’nature of the alternative hypothesis, the procedure to investigate such

a null hypothesis is known as a two-sided test.

In other cases the null hypothesis might specify a minimum or amaximum value, in which case the alternative hypothesis consists ofvalues below, or values above respectively The procedure we use in

these cases is called a one-sided test Table 16.2 lists the three

Trang 27

After establishing the form of the hypotheses we can test them usingthe sample evidence we have We need to decide if the sample evi-dence is compatible with the null hypothesis, in which case we cannotreject it If the sample evidence contradicts the null hypothesis wereject it in favour of the alternative hypothesis.

To help us make this decision we need a decision rule to apply to our

sample evidence The decision rule is based on the assumption that thenull hypothesis is true

If the population mean really does take the value specified in thenull hypothesis, then as long as we know the value of the populationstandard deviation, ␴, and the size of our sample, n, we can identify the

sampling distribution that our sample belongs to

Example 16.15

A bus company promotes a ‘one-hour’ tour of a city Suggest suitable null and tive hypotheses for an investigation by:

alterna-(a) a passenger who wants to know how long the journey will take

(b) a journalist from a consumer magazine who wants to see whether passengersare being cheated

In the first case we might assume that the passenger is as concerned about the tourtaking too much time as too little time, so appropriate hypotheses would be that thepopulation mean of the times of the tours is either equal to one hour or it is not

H0:␮  60 minutes H1:␮  60 minutes

In the second case we can assume that the investigation is more focused The journalist

is more concerned that the trips might not take the full hour rather than taking longerthan an hour, so appropriate hypotheses would be that the population mean tour time

is either one hour or more, or it is less than an hour

H0:␮ 60 minutes H1:␮ 60 minutes

Table 16.2

Types of hypotheses

Null hypothesis Alternative hypothesis Type of test

H0:␮  ␮0 H1:  ␮0 (not equal) Two-sided

H0:␮ ␮0 H1: 0(greater than) One-sided

H0:␮ 0 H1:␮ ␮0 (less than) One-sided

Trang 28

The next stage is to compare our sample mean to the sampling bution it is supposed to come from if H0is true If it seems to belong tothe sampling distribution, in other words it is not too far away from thecentre of the distribution, we consider the null hypothesis plausible If,

distri-on the other hand, the sample mean is located distri-on distri-one of the extremes

of the sampling distribution we consider the null hypothesis suspect

We can make this comparison by working out the z-equivalent of

the sample mean and using it to find out the probability that a samplemean of the order of the one we have comes from the sampling distri-bution that is based on the null hypothesis being true Because we are

using a z-equivalent, this type of hypothesis test is called a z test.

Example 16.16

The standard deviation of the bus tours in Example 16.15 is known to be 6 minutes Ifthe duration of a random sample of 50 tours is to be recorded in order to investigate theoperation, what can we deduce about the sampling distribution the mean of the samplebelongs to?

The null hypotheses in both sections of Example 16.15 specified a population mean,

␮, of 60 minutes If this is true the mean of the sampling distribution, the distribution

of means of all samples consisting of 50 observations, is 60 The population standarddeviation,␴, is 6 so the standard error of the sampling distribution, ␴/√n, is 6/√50,

0.849 minutes

We can conclude that the sample mean of our sample will belong to a sampling distribution which is normal with a mean of 60 and a standard error of 0.849, if H0is true

Example 16.17

The mean of the random sample in Example 16.16 is 61.87 minutes What is the

z-equivalent of this sample mean, assuming it belongs to a sampling distribution with a

mean of 60 and a standard error of 0.849? Use the z-equivalent to find the probability

that a sample mean of this magnitude comes from such a sampling distribution

Using Table 5 on pages 621–622 of Appendix 1:

... hypothesis test is called a z test.

Example 16. 16

The standard deviation of the bus tours in Example 16. 15 is known to be minutes Ifthe duration of a random sample... error of 0.849, if H0is true

Example 16. 17

The mean of the random sample in Example 16. 16 is 61.87 minutes What is the

z-equivalent of this... distribution the mean of the samplebelongs to?

The null hypotheses in both sections of Example 16. 15 specified a population mean,

␮, of 60 minutes If this is true the mean of the sampling

Ngày đăng: 06/07/2014, 00:20

TỪ KHÓA LIÊN QUAN