1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 22 pdf

25 219 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 25
Dung lượng 425,46 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

CHAPTER 43Multiple Comparisons in the Linear Model Due to the isomorphism of tests and confidence intervals, we will keep this wholediscussion in terms of confidence intervals.. Rectangu

Trang 1

CHAPTER 43

Multiple Comparisons in the Linear Model

Due to the isomorphism of tests and confidence intervals, we will keep this wholediscussion in terms of confidence intervals

43.1 Rectangular Confidence RegionsAssume you are interested in two linear combinations of β at the same time, i.e.,you want separate confidence intervals for them If you use the Cartesian product(or the intersection, depending on how you look at it) of the individual confidenceintervals, the confidence level of this rectangular confidence region will of necessity

be different that of the individual intervals used to form this region If you want the

Trang 2

joint confidence region to have confidence level 95%, then the individual confidenceintervals must have a confidence level higher than 95%, i.e., they must be be wider.There are two main approaches for compute the confidence levels of the indi-vidual intervals, one very simple one which is widely applicable but which is onlyapproximate, and one more specialized one which is precise in some situations andcan be taken as an approximation in others.

43.1.1 Bonferroni Intervals To derive the first method, the Bonferroni tervals, assume you have individual confidence intervals Rifor parameterφi In order

in-to make simultaneous inferences about the whole parameter vecin-tor φ=

φ1

if and only if φi∈ Ri for all i

Usually it is difficult to compute the precise confidence level of such a gular set If one cannot be precise, it is safer to understate the confidence level.The following inequality from elementary probability theory, called the Bonferroni

Trang 3

rectan-inequality, gives a lower bound for the confidence level of this Cartesian uct: Given i events Ei with Pr[Ei] = 1 − αi; then Pr[T Ei] ≥ 1 −P αi Proof:Pr[T Ei] = 1 − Pr[S E0

ti has at distribution For certain special cases of Ψ, certain quantiles of this jointdistribution have been calculated and tabulated This allows to compute the preciseconfidence levels of multipletintervals in certain situations

Trang 4

Problem 432 Show that the correlation coefficient betweenti andtj is ρij Butgive a verbal argument that the ti are not independent, even if the ρij = 0, i.e zi

are independent (This means, one cannot get the quantiles of their maxima fromindividual quantiles.)

Answer First we have E[t j ] = E[z j ] E[1s] = 0, since z j and s are independent Therefore (43.1.2)

cov[t i , t j ] = E[t i t j ] = E[E[t i t j ]|s] = E[E[1

43.1.3 Studentized Maximum Modulus and Related Intervals Look

at the special case where all ρij are equal, call them ρ Then the following quantileshave been tabulated by [HH71], and reprinted in [Seb77, pp 404–410], where theyare called uαi,ν,ρ:

Trang 5

If one needs only two joint confidence intervals, i.e., if i = 2, then there are onlytwo off-diagonal elements in the dispersion matrix, which must be equal by symmetry.

A 2 × 2 dispersion matrix is therefore always “equicorrelated.” The values of the

2,n−k,ρcan therefore be used to compute simultaneous confidence intervals for anytwo parameters in the regression model For ρ one must use the actual correlationcoefficient between the OLS estimates of the respective parameters, which is knownprecisely

Problem 433 In the model y = Xβ +ε, with ε ∼ (o, σ2I), give a formulafor the correlation coefficient between g>βˆ and h>β, where g and h are arbitraryˆconstant vectors

Answer This is in Seber, [ Seb77 , equation (5.7) on p 128].

(43.1.4) ρ = g>(X>X)−1h/p(g > (X>X) −1 g)(h>(X>X) −1 h)



But in certain situations, those equicorrelated quantiles can also be applied fortesting more than two parameters The most basic situation in which this is thecase is the following: you have n × m observations yij = µi+εij, and the εij ∼NID(0, σ2) Then the equicorrelated t quantiles allow you to compute precise jointconfidence intervals for all µi Defines2=P (y − ¯y )2/(n(m − 1)), and definez

Trang 6

by zi = (¯yi·− µi)√

m These zi are normal with mean zero and dispersion matrix

σ2I, and they are independent ofs2 Therefore one gets confidence intervals(43.1.5) µi∈ ¯yi·± uα

n,n(m−1),0s √

m

This simplest example is a special case of “orthogonal regression,” in which X>X

is a diagonal matrix One can do the same procedure also in other cases of orthogonalregression, such as a regression with orthogonal polynomials as explanatory variables.Now return to the situation of the basic example, but assume that the first row ofthe matrixY of observations is the reference group, and one wants to know whetherthe means of the other groups are significantly different than that first group Givethe first row the subscript i = 0 Then use zi = (¯yi·− ¯y0·)√

m/√

2, i = 1, , n.One obtains again the multivariatet, this time ρ = 1/2 Miller calls these intervals

“many-one intervals.”

Problem 434 Assume again we are in the situation of our basic example, vert to counting i from 1 to n Construct simultaneous confidence intervals for thedifference between the individual means and the grand mean

re-Answer One uses

Trang 7

where ¯ y ·· is the grand sample mean and µ its population counterpart Since ¯ y ·· = 1nP¯i· one obtains cov[¯ yi·, ¯ y··] =n1var[¯ yi·] =mnσ2 Therefore var[¯ yi·− ¯ y··] = σ 21

m −mn2 +mn1



= σ 2n−1 mn



.

43.1.4 Studentized Range This is a famous example, it is not in Seber[Seb77], but we should at least know what it is Just as the projected F intervalsare connected with the name of Scheff´e, these intervals are connected with the name

of Tukey Again in the situation of our basic example one uses ¯yi·− ¯yk· to buildconfidence intervals for µi− µk for all pairs i, k : i 6= k This is no longer theequicorrelated case (Such simultaneous confidence intervals are useful if one knowsthat one will compare means, but one does not know a priori which means.)

Problem 435 Again in our basic example, define

z= √12

Compute the correlation matrix of z

Trang 8

= 12m

43.2 Relation between F-test and t-tests

Assume you have constructed thet-intervals for several different linear tions of the two parameters β1 and β2 In the (β1, β2)-plane, each of these intervalscan be represented by a band delimited by parallel straight lines If one draws many

combina-of these bands, their intersection becomes an ellipse, which has the same shape asthe joint F-confidence region for β1 and β2, but it is smaller, i.e., it comes from an

F-test for a lower significance level α

The F-test, say for β1 = β2 = 0, is therefore equivalent not to two but toinfinitely many t-tests, one for each linear combination of β1 and β2, but each ofthese t tests has a higher confidence level than that of theF test This is the rightway how to look at theF test

Trang 9

What are situations in which one would want to obtain aF-confidence region inorder to get information about many different linear combinations of the parameters

at the same time?

For instance, one examines a regression output and looks at all parameters andcomputes linear combinations of parameters of interest, and believes they are sig-nificant if their t-tests reject This whole procedure is sometimes considered as amisuse of statistics, “data-snooping,” but Scheff´e argued it was justified if one raisesthe significance level to that of the F test implied by the infinitely manyt tests ofall linear combinations of β

Or one looks at only certain kinds of linear combinations, for instance, at allcontrasts, i.e., linear combinations whose coefficients sum to zero This is a verythorough way to ascertain that all parameters are equal

Or if one wants to draw a confidence band around the whole regression line.Problem 436 Someone fits a regression with 18 observations, one explanatoryvariable and a constant term, and then draws around each point of the regression line

a standard 95% t interval What is the probability that the band created in this waycovers the true regression line over its entire length? Note: the Splus commandsqf(1-alpha,df1,df2) and qt(1-alpha/2,df) give quantiles, and the commandspf(critval,df1,df2) and pt(critval,df) give the cumulative distribution func-tion of F andt distributions

Trang 10

Answer Instead of n = 18 and k = 2 we do it for arbitrary n and k We need a α such that

t(n−k;0.025)=p2F(k,n−k;α)(43.2.1)

The Splus command is obsno<-18; conflev<-pf((qt(0.975,obsno-2)^2/2,2,obsno-2) The value

Problem 437 6 points Which options do you have if you want to test morethan one hypothesis at the same time? Describe situations in which one F-test isbetter than two t-tests (i.e., in which an elliptical confidence region is better than

a rectangular one) Are there also situations in which you might want two t-testsinstead of one F-test?

In the one-dimensional case this confidence region is identical to the t-interval.But if one draws for i = 2 the confidence ellipse generated by theF-test and the twointervals generated by the t-tests into the same diagram, one obtains the picture as

in figure 5.1 of Seber [Seb77], p 131 In terms of hypothesis testing this means:there are values for which the F test does not reject but one or botht tests reject,and there are values for which one or botht-tests fail to reject but theF-test rejects.The reason for this confusing situation is that one should not comparettests andF

Trang 11

tests at the same confidence level The relationship between those testing proceduresbecomes clear if one compares the F test at a given confidence level to t tests at acertain higher confidence level.

We need the following math for this For a positive definite Ψ and arbitrary x

it follows from (A.5.6) that

Trang 12

can also be written as

It is sufficient to take the intersection over all g with unit length What does each

of these regions intersected look like? First note that the i × 1 vector u lies in thatregion if and only if g>u lies in at-interval for g>Rβ, whose confidence level is nolonger α but is γ = Pr[|t| ≤piF(i,n−q;α)], where t is distributed as a t with n − qdegrees of freedom Geometrically, in Seber [Seb77]’s figure 5.1, these confidenceregions can be represented by all the bands tangent to the ellipse

Trang 13

Taking only the vertical and the horizontal band tangent to the ellipse, one hasnow the following picture: if one of the t-tests rejects, then the F-test rejects too.But it may be possible that theF-test rejects but neither of the twot-tests rejects.

In this case, there must be some other linear combination of the two variables forwhich the ttest rejects

Another example for simultaneoust-tests, this time derived from Hotelling’s T2,

is given in Johnson and Wichern [JW88, chapter 5] It is very similar to the above;

we will do here only the large-sample development:

43.3 Large-Sample Simultaneous Confidence Regions

Assume every rowyi of the n × p matrix Y is an independent drawing from apopulation with mean µ and dispersion matrix ΣΣΣ If n is much larger than p, thenone can often do all tests regarding the unknown µ in terms of the sample mean ¯y,which one may assume to be normally distributed, and whose true dispersion matrixmay be assumed to be know and to be equal to the sample dispersion matrix of the

yi, divided by n

Therefore it makes sense to look at the following model (the y in this model isequal to the ¯y in the above model, and the ΣΣΣ in this model is equal toS/n, or anyother consistent estimate, for that matter, in the above model):

Trang 14

Assumey∼N(µ, ΣΣΣ) with unknown µ and known ΣΣΣ We allow ΣΣΣ to be singular,i.e., there may be some nonzero linear combinations g>ywhich have zero variance.Let q be the rank of ΣΣΣ Then a simultaneous 1 − α confidence region for all linearcombinations of µ is

(43.3.1) g>µ ∈ g>y±

q

χ2 q (α)p

Trang 15

to worry about those g with g>ΣΣg 6= 0:

Pr[g>µ ∈ g>y±

q

χ2 q (α)p

g>ΣΣg for all g ] =(43.3.2)

= Pr[g>µ ∈ g>y±

q

χ2 q (α)p

g>ΣΣg for all g with g>ΣΣg 6= 0 ] =(43.3.3)

= Pr[(y− µ)>Σ−(y− µ) ≤χ2q(α)] = 1 − α(43.3.6)

One can apply the maximization theorem here because y− µ can be written in theform ΣΣΣu for some u

Now as an example let’s do Johnson and Wichern, example 5.7 on pp 194–197

In a survey, people in a city are asked which bank is their primary savings bank Theanswers are collected as rows in theY matrix The columns correspond to banks A,

B, C, D, to some other bank, and to having no savings Each row has exactly one 1 inthe column corresponding to the respondent’s primary savings bank, zeros otherwise.The people with no savings will be ignored, i.e., their rows will be trimmed from thematrix together with the last column After this trimming, Y has 5 columns, and

Trang 16

there are 355 respondents in these five categories It is assumed that the rows of Yare independent, which presupposes sampling with replacement, i.e., the sampling isdone in such a way that theoretically the same people might be asked twice (or thesample is small compared with the population) The probability distribution of each

of these rows, say here the ith row, is the multinomial distribution whose parametersform the p-vector p (of nonnegative elements adding up to 1) Its means, variances,and covariances can be computed according to the rules for discrete distributions:

E[yij] = 1(pj) + 0(1 − pj) = pj(43.3.7)

var[yij] = E[y2ij] − (E[yij])2= pj− p2

j = pj(1 − pj) because y2ij =yij(43.3.8)

cov[yij,yik] = E[yijyik] − E[yij] E[yik] = −pjpk because yijyik= 0

(43.3.9)

The pican be estimated by the ith sample means From these sample means onealso obtains an estimateS of the dispersion matrix of the rows ofY This estimate

is singular (as is the true dispersion matrix), it has rank r − 1, since every row of the

Y-matrix adds up to 1 Provided n − r is large, which means here that nˆpk ≥ 20for each category k, one can use the normal asymptotics, and gets as simultaneousconfidence interval for all linear combinations

(43.3.10) g>p ∈ g>p ±ˆ

q

χ2 r−1 (α)

r

g>Sgn

Trang 17

A numerical example illustrating the width of these confidence intervals is given in[JW88, p 196].

Trang 19

CHAPTER 44

Sample SAS Regression Output

Table 1 is the output of a SAS run The dependent variable is the y variable,here it has the name wagerate “Analysis” is the same as “decomposition,” and

“variance” is here the sample variance or, say better, the sum of squares “Analysis ofvariance” is a decomposition of the “corrected total” sum of squaresPn

j=1(yj− ¯y)2=8321.91046 into its “explained” partPn

j=1(ˆyj− ¯y)2= 1553.90611, the sum of squareswhose “source” is the “model,” and its “unexplained” part, the sum of squared

Trang 20

regression The d.f of the sum of squares due to the model consists in the number

of slope parameters (not counting the intercept) in the model

The “mean squares” are the corresponding sum of squares divided by their grees of freedom This “mean sum of squares due to error” should not be con-fused with the “mean squared error” of an estimator ˆθ of θ, defined as MSE[ˆθ; θ] =E[(ˆθ − θ)2] One can think of the mean sum of squares due to error as the sampleanalog of the MSE[ˆ;y]; it is as the same time an unbiased estimate of the distur-bance variance σ2 The mean sum of squares explained by the model is an unbiasedestimate of σ2 if all slope coefficients are zero, and is larger otherwise TheF value

de-is the mean sum of squares explained by the model divided by the mean sume ofsquares due to error, this is the value of the test statistic for the F-test that allslope parameters are zero The p-value prob > F gives the probability of getting aneven larger F-value when the null hypothesis is true, i.e., when all parameters areindeed zero To reject the null hypothesis at significance level α, this p-value must

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN