Lecture Undergraduate econometrics - Chapter 5: Inference in the simple regression model: Interval estimation, hypothesis testing, and prediction

Chapter 5 - Inference in the simple regression model: Interval estimation, hypothesis testing, and prediction. In this chapter, students will be able to understand: Interval estimation, hypothesis testing, the least squares predictor.

Trang 2

If all the above-mentioned assumptions are correct, then the least squares estimators b1

distributions with means and variances as follows:

σ =

−

∑

Trang 3

By replacing the unknown parameter σ2 with this estimator we can estimate the variances

of the least squares estimators and their covariance

In Chapter 4 you learned how to calculate point estimates of the regression

from which the sample data was drawn

In this chapter we introduce the additional tools of statistical inference: interval estimation, prediction, interval prediction, and hypothesis testing A prediction is a

forecast of a future value of the dependent variable y, for creating ranges of values,

sometimes called confidence intervals, in which the unknown parameters, or the value

of y, are likely to be located Hypothesis testing procedures are a means of comparing

conjecture that we as economists might have about the regression parameters to the information about the parameters contained in a sample of data Hypothesis tests allow

Trang 4

us to say that the data are compatible, or are not compatible, with a particular conjecture,

or hypothesis

The procedures for interval estimation, prediction, and hypothesis testing, depend heavily on assumption SR6 of the simple linear regression model, and the resulting normality of the least squares estimators If assumption SR6 is not made, then the sample size must be sufficiently large so that the least squares estimator’s distributions are

approximately normal, in which case the procedures we develop in this chapter are also

approximate In developing the procedures in this chapter we will be using the normal

distribution, and distributions related to the normal, namely “Student’s” t-distribution and

the chi-square distribution

Trang 5

dividing by its standard deviation:

Trang 6

2 2 2

~ (0,1)var( )

5.5.1a The Chi-Square Distribution

• Chi-square random variables arise when standard normal, N(0,1), random variables are

Trang 7

The notation 2

( )

with m degrees of freedom The degrees of freedom parameter m indicates the

number of independent N(0,1) random variables that are squared and summed to form

V

• The value of m determines the entire shape of the chi-square distribution, and its mean

and variance

2 ( )

Trang 8

• Since V is formed be squaring and summing m standardized normal [N(0,1)] random variables, the value of V must be nonnegative, v ≥ 0

• The distribution has a long tail, or is skewed, to the right

• As the degrees of freedom m gets larger, the distribution becomes more symmetric and

“bell-shaped.”

• As m gets large, the chi-square distribution converges to, and essentially becomes, a

normal distribution

5.5.1b The Probability Distribution of σ ˆ2

• If SR6 holds, then the random error term e t has a normal distribution, e t ~ N(0,σ2)

N(0,1)

• The square of a standard normal random variable is a chi-square random variable with

(1)

( / ) ~e t σ χ

Trang 9

• If all the random errors are independent then

Trang 10

• All T residuals eˆt = − −y t b1 b x2 t depend on the least squares estimators b1 and b2 It

can be shown that only T – 2 of the least squares residuals are independent in the

• We have not established the fact that the chi-square random variable V is statistically

attention to define a t-random variable

Trang 11

5.1.1c The t-Distribution

• A “t” random variable (no uppercase) is formed by dividing a standard normal, Z ~

N(0,1), random variable by the square root of an independent chi-square random

= (5.1.7)

• The shape of the t-distribution is completely determined by the degrees of freedom

Trang 12

• Figure 5.2 shows a graph of the t-distribution with m = 3 degrees of freedom, relative

to the N(0,1) Note that the t-distribution is less “peaked,” and more spread out than the N(0,1)

standard normal N(0,1)

5.1.1d A Key Result

• From the two random variable V and Z we can form a t-random variable Recall that a

t-random variable is formed by dividing a standard normal random variable, Z~N(0,1),

Trang 13

• The t-distribution’s shape is completely determined by the degrees of freedom

Equations (5.1.1) and (5.1.5), respectively, we have

2 2

t T b

b

− βσ

Trang 14

5.1.2 Obtaining Interval Estimates

• If assumptions SR1-SR6 of the simple linear regression model hold, then

The random variable t in Equation (5.1.9) will be the basis for interval estimation and

hypothesis testing in the simple linear regression model

Trang 15

t(m) distribution such that

probability, so that 1-α of the probability is contained in the center portion

Consequently, we can make the probability statement

Trang 16

( c c) 1

P − ≤ ≤t t t = − α (5.1.11)

• Now, we put all these pieces together to create a procedure for interval estimation

Substitute t from Equation (5.1.10) into Equation (5.1.11) to obtain

Trang 17

In the interval endpoint, b2 – t c se(b2) and b2 + t c se(b2), both b2 and se(b2) are random variables, since their value are not known until a sample of data is drawn

random endpoints, has probability 1–α of containing the true but unknown parameter

model assumptions SR1-SR6 and may be applied to any sample of data that we might

on a sample of data, then b2±t cse( )b2 is called a (1–α)×(100%) interval estimate of β2,

or, equivalently, it is called a (1–α)×(100%) confidence interval

• The properties of the random interval estimator are based on the notion of repeating sampling If we were to select many random samples of size T, compute the least

Trang 18

the interval estimate b2±t cse( )b2 for each sample, then (1–α)×(100%) of all the

data are actually collected

• Any one interval estimate, based on one sample of data, may or may not contain the

When confidence intervals are discussed, remember that our confidence is in the

procedure used to construct the interval estimate; it is not in any one interval estimate

calculated from a sample of data

Trang 19

5.1.3 The Repeated Sampling Context

• Table 5.1, using the ten samples of data, reports the least squares estimates, the

• Sampling variability causes the center of each of the interval estimates to change with the location of the least squares estimates, and it causes the widths of the intervals to change with the standard errors

Table 5.1 Least Squares Estimates from 10 Random Samples

Trang 20

Table 5.2 Interval Estimates from 10 Random Samples

n b1−t cse( ) b1 b1+t cse( ) b1 b2 −t cse( ) b2 b2 +t cse( ) b2

Trang 21

• We have used the least squares estimators to obtain, from a sample of data, point estimates that are “best guesses” of unknown parameters The estimated variance ˆ

var( )b , for k = 1 or 2, and its square root k var( )ˆ b k = se( )b k , provide information

about the sampling variability of the least squares estimator from one sample to

another

• Interval estimators combine point estimation with estimation of sampling variability to provide a range of values in which the unknown parameters might fall Interval estimates are a convenient way to inform others about the estimated location of the

Trang 22

unknown parameter and also provide information about the sampling variability of the

• When the sampling variability of the least squares estimator is relatively small, then the interval estimates will be relatively narrow, implying that the least squares estimates are “reliable.” On the other hand, if the least squares estimators suffer from large sampling variability, then the interval estimates will be wide, implying that the least squares estimates are “unreliable.”

Trang 23

• The critical value t c = 2.024, which is appropriate for α = 05 and 38 degrees of freedom, can be found in the Table 2 at the end of the book

• It can be computed exactly with a software package

which has the standard error

Trang 24

• Is β2 in the interval [.0666, 1900]? We do not know, and will never know What we

do know is that when the procedure we used is applied to many random samples of data from the same population, then 95 percent of all the interval estimates constructed using this procedure will contain the true parameter The interval estimation procedure “works” 95 percent of the time

• All we can say about the interval estimate based on our one sample is that, given the

[.0666, 1900] Since this interval estimate contains no random quantities we cannot

the interval [.0666, 1900].”

alone gives no sense of its reliability Interval estimates incorporate both the point estimate and the standard error of the estimate, which is a measure of the variability of

Trang 25

the least squares estimator If an interval estimate is wide (implying a large standard

Trang 26

Hypothesis testing procedures compare a conjecture we have about a population to the

information contained in a sample of data More specifically, the conjectures we test concern the unknown parameters of the economic model Given an econometric model,

hypotheses are formed about economic behavior These hypotheses are then represented

as conjectures about model parameters

Hypothesis testing uses the information about a parameter that is contained in a sample of data, namely, its least squares point estimate and its standard error, to draw a conclusion about the conjecture, or hypothesis

In each and every hypothesis test four ingredients must be present:

Components of Hypothesis Tests

3 A test statistic

Trang 27

4 A rejection region

5.2.1 The Null Hypothesis

in the context of a specific regression model A null hypothesis is the belief we will maintain until we are convinced by the sample evidence that it is not true, in which case

we reject the null hypothesis

5.2.2 The Alternative Hypothesis

accept if the null hypothesis is rejected The alternative hypothesis is flexible and

Trang 28

depends to some extent on economic theory For the null hypothesis H0: β2 = c three

possible alternative hypotheses are:

takes some other value greater than or less than c

• H1: β2 > c Rejecting the null hypothesis that β2 is c leads to the conclusion that it is

It implies that these values are logically unacceptable alternatives to the null hypothesis Inequality alternative hypotheses are widely used in economics, since

economic theory frequently provides information about the signs of relationships

between variables

chance that β2 > c

Trang 29

5.2.3 The Test Statistic

The sample information about the null hypothesis is embodied in the sample value of a

test statistic Based on the value of a test statistic, which itself is a random variable, we

decide either to reject the null hypothesis or not to reject it A test statistic has a very

special characteristic: its probability distribution must be completely known when the null

hypothesis is true, and it must have some other distribution if the null hypothesis is not

true

c and the alternative H1: β2 ≠ c In Equation (5.1.10) we established, under

assumptions SR1-SR6 of the simple linear regression model, that

Trang 30

• If the null hypothesis H0: β2 = c is true, then, by substitution, it muse be true that

If the null hypothesis is not true, then the t-statistic in Equation (5.2.2) does not have a

t-distribution with T − 2 degrees of freedom

• To examine the distribution of the t-statistic in Equation (5.2.2) when the null

(5.1.8), we would find that

Trang 31

If β2 = 1, and c ≠ 1, then the test statistic in Equation (5.2.2) does not have a distribution, since, in its formation, the numerator of Equation (5.1.8) is not standard

b2

Equation (5.1.8) that is used in forming Equation (5.2.2) has the distribution

Trang 32

Since its mean is not zero, this distribution is not standard normal, as required in the formation of a t random variable

5.2.4 The Rejection Region

• The rejection region is the range of values of the test statistic that leads to rejection of

the null hypothesis It is possible to construct a rejection region only if we have a test statistic whose distribution is known when the null hypothesis is true

• In practices, the rejection region is a set of test statistic values that, when the null

hypothesis is true, are unlikely and have low probability of occurring If a sample

value of the test statistic is obtained that falls in a region of low probability, then it is unlikely that the test statistic has the assumed distribution, and thus it is unlikely that the null hypothesis is true

Trang 33

• To illustrate let us continue to use the food expenditure example If the null

hypothesis H0: β2 = c is true, then the test statistic t = (b2 – c)/se(b2) ~ t (T – 2) Thus, if

the hypothesis is true, then the distribution of t is that shown in Figure 5.3

tend to be unusually “large” or unusually “small.” The terms large and small are determined by choosing a probability α, called level of significance of the test, which

provides a meaning for “an unlikely event.” The level of significance of the test α is frequently chosen to be 01, 05 or 10

≤ t c) = α/2 Thus, the rejection region consists of the two “tails” of the t-distribution

• When the null hypothesis is true, the probability of obtaining a sample value of the test

statistic that falls in either tail area is “small,” and, combined, is equal to α Sample values of the test statistic that are in the tail areas are incompatible with the null

hypothesis and are evidence against the null hypothesis being true When testing the

Trang 34

null hypothesis H0: β2 = c against the alternative H1: β2 ≠ c we are led to the following rule:

Rejection rule for a two-tailed test: If the value of the test statistic falls in the

rejection region, either tail of the t-distribution, then we reject the null hypothesis

and accept the alternative

• If the null hypothesis H0: β2 = c is true, then the probability of obtaining a value of the test statistic t in the central nonrejection region, P(–t c ≤ t ≤ tc) = 1 – α, is high Sample

values of the test statistic in the central non-rejection area are compatible with the null

hypothesis and are not taken as evidence against the null hypothesis being true

• Finding a sample value of the test statistic in the non-rejection region does not make the null hypothesis true! Intuitively, if the true value of β2 is near c, but not equal to it,

then the value of the test statistic will still fall in the nonrejection region with high probability In this case we would not reject the null hypothesis even though it is false

Trang 35

Consequently, when testing the null hypothesis H0: β2 = c against the alternative H1: β2

≠ c the true is:

If the value of the test statistic falls between the critical values −tc and t c, in the

non-rejection region, then we do not reject the null hypothesis

• Avoid saying that “we accept the null hypothesis.” This statement implies that we are concluding that the null hypothesis is true, which is not the case at all, based on the preceding discussion The weaker statements, “We do not reject the null hypothesis,”

or “We fail to reject the null hypothesis,” do not send any misleading message

• The test decision rules are summarized in Figure 5.4

Trang 36

5.2.5 The Food Expenditure Example

• Let us illustrate the hypothesis testing procedure by testing the null hypothesis that

is, if income rises by $100, do we expect food expenditures to rise by $10, or not? We will carry through the test using, as you should, a standard testing format that summarizes the four test ingredients and the test outcome

• Format for Testing Hypotheses

1 Determine the null and alternative hypotheses

2 Specify the test statistic and its distribution if the null hypothesis is true

3 Select α and determine the rejection region

4 Calculate the sample value of the test statistic

5 State your conclusion

Định dạng
Số trang	66
Dung lượng	214,94 KB