Using the t-Statistic When the Sample

Một phần của tài liệu Introduction to econometrics 4er global edition stock (Trang 124 - 128)

APPENDIX 19.7 Regression with Many Predictors: MSPE, Ridge Regression, and Principal Components Analysis 758

3.6 Using the t-Statistic When the Sample

difference in household income according to the socioeconomic status of the father. However, does this analysis tell us the full story? Are individu- als with higher levels of education likely to be in households with more than one earner? Does the difference in household income arise from an indi- vidual’s own contribution to household income or, if the individual is cohabiting, also from her or his partner’s contribution to household income? Is this relationship affected by changing patterns of edu- cational attainment that are correlated with age?

We will examine questions such as these further once we have introduced the basics of multivariate regression in later chapters.

a 95% confidence interval of (£251.38, £508.93). It is worth noting the difference in income, pooling these educational categories together, between those whose father’s NS-SEC categorization is “higher” and those where this categorization is lower is £810.25.

The results in the table suggest a difference in com- position by educational attainment of these groupings according to the father’s NS-SEC category. When broken down in this way, however, the estimated dif- ference for every qualification level is substantially lower than £810.25. All of these estimated differences are significantly different from zero.

This empirical analysis suggests that levels of education do play some part in explaining the

experiments,” also called quasi-experiments, in which some event unrelated to the treatment or subject characteristics has the effect of assigning different treatments to different subjects as if they had been part of a randomized controlled experiment.

The box “A Way to Increase Voter Turnout” provides an example of such a quasi- experiment that yielded some surprising conclusions.

3.6 Using the t-Statistic When the Sample Size Is Small

In Sections 3.2 through 3.5, the t-statistic is used in conjunction with critical values from the standard normal distribution for hypothesis testing and for the construction of confidence intervals. The use of the standard normal distribution is justified by the central limit theorem, which applies when the sample size is large. When the sample size is small, the standard normal distribution can provide a poor approximation to the distribution of the t-statistic. If, however, the population distribution is itself nor- mally distributed, then the exact distribution (that is, the finite-sample distribution;

see Section 2.6) of the t-statistic testing the mean of a single population is the Student t distribution with n - 1 degrees of freedom, and critical values can be taken from the Student t distribution.

M03_STOC4455_04_GE_C03.indd 123 13/12/18 1:26 PM

Apathy among citizens toward political partici- pation, especially in voting, has been noted in the United Kingdom and other democratic coun- tries. This kind of behavior is generally seen in econ- omies where people have greater mobility, maintain an intensive work culture, and work for private corporate entities. Apart from these, there could be other dominant factors that have had a negative impact on the citizens’ willingness to participate in elections—politicians failing to keep their promises, inappropriately using public funds.

In 2005, during the campaign period before the general election, a study was conducted in a Man- chester constituency in the United Kingdom. The constituency’s voter turnout rate in the 2001 general election had been 48.6%, while the national average had been 59.4%. Thus, voter participation in this con- stituency was far below the national average. For the experiment, three groups (two treatment groups and one control group) were randomly selected out of the registered voters from whom landline numbers could be obtained. One of the treatment groups was exposed to strong canvassing in the form of telephone calls, and the other treatment group was exposed to strong can- vassing in the form of door-to-door visits. No contacts were made with the control group. The callers and the door-to-door canvassers were given instructions to ask respondents three questions, namely, whether the respondents thought voting is important, whether the respondents intended to vote, and whether they would vote by post. The conversations were informal and the main objective of this exercise was to per- suade citizens to vote, by focusing on the importance

of voting. The callers and canvassers were also advised to respond to any concerns of the voters regarding the voting process.

The researchers got interesting results from the elections. The participation rate was 55.1% in the group, which was exposed to canvassing. The par- ticipation rate for the treatment group, which was treated with telephone calls, was 55%. Both these rates had a difference with the control group, which was not exposed to any experiment. Further cal- culations using suitable methodologies gave esti- mates of the effects of canvassing and telephone calls. 6.7% and 7.3% were the estimates of the two.

The overall experiment was a success as the two interventions done on the two treatments groups by a non-partisan source had impacts that were sta- tistically significant.

This exercise illustrated that citizens can be nudged to participate in elections by creating awareness through personal contacts. In yet another democracy, India, the 2014 general election saw a record voter turnout. A top Election Commission official has said that the Election Commission’s efforts to increase voters’ awareness and their reg- istration has helped the process.

Sources: 1. Alice Moseley, Corinne Wales, Gerry Stoker, Graham Smith, Liz Richardson, Peter John, and Sarah Cot- terill, “Nudge, Nudge, Think, Think Experimenting with Ways to Change Civic Behaviour,” Bloomsbury Academic, March 2013. 2. “Lok Sabha Polls 2014: Country Records Highest Voter Turnout since Independence,” The Economic Times, May 13, 2014.

A Way to Increase Voter Turnout

M03_STOC4455_04_GE_C03.indd 124 13/12/18 1:26 PM

3.6   Using the t-Statistic When the Sample Size Is Small 125

The t-Statistic and the Student t Distribution

The t-statistic testing the mean. Consider the t-statistic used to test the hypothesis that the mean of Y is mY,0, using data Y1,c, Yn. The formula for this statistic is given by Equation (3.10), where the standard error of Y is given by Equation (3.8). Substi- tution of the latter expression into the former yields the formula for the t-statistic:

t = Y - mY,0

2s2Y>n, (3.22)

where s2Y is given in Equation (3.7).

As discussed in Section 3.2, under general conditions the t-statistic has a standard normal distribution if the sample size is large and the null hypothesis is true [see Equation (3.12)]. Although the standard normal approximation to the t-statistic is reliable for a wide range of distributions of Y if n is large, it can be unreliable if n is small. The exact distribution of the t-statistic depends on the distribution of Y, and it can be very complicated. There is, however, one special case in which the exact dis- tribution of the t-statistic is relatively simple: If Y1,c, Yn are i.i.d. draws from a normal distribution, then the t-statistic in Equation (3.22) has a Student t distribution with n - 1 degrees of freedom. (The mathematics behind this result is provided in Sections 18.4 and 19.4.)

If the population distribution is normally distributed, then critical values from the Student t distribution can be used to perform hypothesis tests and to construct confidence intervals. As an example, consider a hypothetical problem in which tact = 2.15 and n = 8, so that the degrees of freedom is n - 1 = 7. From Appendix Table 2, the 5% two-sided critical value for the t7 distribution is 2.36. Because the t-statistic is smaller in absolute value than the critical value 12.15 6 2.362, the null hypothesis would not be rejected at the 5% significance level against the two-sided alternative. The 95% confidence interval for mY, constructed using the t7 distribution, would be Y { 2.36SE1Y2. This confidence interval is wider than the confidence interval constructed using the standard normal critical value of 1.96.

The t-statistic testing differences of means. The t-statistic testing the difference of two means, given in Equation (3.20), does not have a Student t distribution, even if the population distribution of Y is normal. (The Student t distribution does not apply here because the variance estimator used to compute the standard error in Equation (3.19) does not produce a denominator in the t-statistic with a chi-squared distribution.)

A modified version of the differences-of-means t-statistic, based on a different standard error formula—the “pooled” standard error formula—has an exact Student t distribution when Y is normally distributed; however, the pooled standard error formula applies only in the special case that the two groups have the same variance or that each group has the same number of observations (Exercise 3.21). Adopt the

M03_STOC4455_04_GE_C03.indd 125 13/12/18 1:26 PM

notation of Equation (3.19) so that the two groups are denoted as m and w. The pooled variance estimator is

s2pooled = 1

nm + nw - 2 C a

nm

i=1 1Yi - Ym22 + a

nw

i=1 1Yi - Ym22

group m group w

S, (3.23) where the first summation is for the observations in group m and the second summa- tion is for the observations in group w. The pooled standard error of the difference in means is SEpooled 1Ym - Yw2 = spooled * 21>nm + 1>nw, and the pooled t-statistic is computed using Equation (3.20), where the standard error is the pooled standard error, SEpooled 1Ym - Yw2.

If the population distribution of Y in group m is N1mm, s2m2, if the population distribution of Y in group w is N1mw, s2w2, and if the two group variances are the same (that is, s2m = s2w), then under the null hypothesis the t-statistic computed using the pooled standard error has a Student t distribution with nm + nw - 2 degrees of freedom.

The drawback of using the pooled variance estimator s2pooled is that it applies only if the two population variances are the same (assuming nmnw). If the population variances are different, the pooled variance estimator is biased and inconsistent. If the population variances are different but the pooled variance formula is used, the null distribution of the pooled t-statistic is not a Student t distribution, even if the data are normally distributed; in fact, it does not even have a standard normal distri- bution in large samples. Therefore, the pooled standard error and the pooled t-statistic should not be used unless you have a good reason to believe that the population variances are the same.

Use of the Student t Distribution in Practice

For the problem of testing the mean of Y, the Student t distribution is applicable if the underlying population distribution of Y is normal. For economic variables, however, normal distributions are the exception (for example, see the boxes in Chapter 2 “The Distribution of Adulthood Earnings in the United Kingdom” and

“The Unpegging of the Swiss Franc”). Even if the data are not normally distributed, the normal approximation to the distribution of the t-statistic is valid if the sample size is large. Therefore, inferences—hypothesis tests and confidence intervals—about the mean of a distribution should be based on the large-sample normal approximation.

When comparing two means, any economic reason for two groups having different means typically implies that the two groups also could have different vari- ances. Accordingly, the pooled standard error formula is inappropriate, and the cor- rect standard error formula, which allows for different group variances, is as given in Equation (3.19). Even if the population distributions are normal, the t-statistic com- puted using the standard error formula in Equation (3.19) does not have a Student

M03_STOC4455_04_GE_C03.indd 126 13/12/18 1:26 PM

Một phần của tài liệu Introduction to econometrics 4er global edition stock (Trang 124 - 128)

Tải bản đầy đủ (PDF)

(801 trang)