Geometricallythis can best be seen if this transformation is an orthogonal projection; thenthe confidence ellipse of the transformed vector Rβ is also a projection or “shadow” of the con
Trang 1as follows:
• Draw the OLS estimateβˆinto k-dimensional space; it is the vector whichminimizesSSE= (y− Xβˆ)>(y− Xβˆ).
Trang 2• For every other vector ˜β one can define the sum of squared errors ated with that vector as SSEβ˜ = (y− X ˜β)>(y− X ˜β) Draw the levelhypersurfaces (if k = 2: level lines) of this function These are ellipsoidscentered onβˆ.
associ-• Each of these ellipsoids is a confidence region for β Different confidenceregions differ by their coverage probabilities
• If one is only interested in certain coordinates of β and not in the others, or
in some other linear transformation β, then the corresponding confidenceregions are the corresponding transformations of this ellipse Geometricallythis can best be seen if this transformation is an orthogonal projection; thenthe confidence ellipse of the transformed vector Rβ is also a projection or
“shadow” of the confidence region for the whole vector Projections of thesame confidence region have the same confidence level, independent of thedirection in which this projection goes
The confidence regions for β with coverage probability π will be written here
as Bβ;π or, if we want to make its dependence on the observation vectory explicit,
Bβ;π(y) These confidence regions are level lines of the SSE, and mathematically,
it is advantageous to define these level lines by their level relative to the minimumlevel, i.e., as as the set of all ˜β for which the quotient of the attained SSEβ˜ =
Trang 3(y− X ˜β)>(y− X ˜β) divided by the smallest possible SSE= (y− Xβˆ)>(y− Xβˆ)
is smaller or equal a given number In formulas,
(41.1.1) β ∈ B˜ β;π(y) ⇐⇒ (y− X ˜β)>(y− X ˜β)
It will be shown below, in the discussion following (41.2.1), that cπ;n−k,konly depends
on π (the confidence level), n − k (the degrees of freedom in the regression), and k(the dimension of the confidence region)
To get a geometric intuition of this principle, look at the case k = 2, in whichthe parameter vector β has only two components For each possible value ˜β of theparameter vector, the associated sum of squared errors is SSEβ˜= (y− X ˜β)>(y−
X ˜β) This a quadratic function of ˜β, whose level lines form concentric ellipses asshown in Figure 1 The center of these ellipses is the unconstrained least squaresestimate Each of the ellipses is a confidence region for β for a different confidencelevel
If one needs a confidence region not for the whole vector β but, say, for i linearlyindependent linear combinations Rβ (here R is a i × k matrix with full row rank),then the above principle applies in the following way: the vector ˜u lies in the con-fidence region for Rβ generated byy for confidence level π, notationBRβ;π, if andonly if there is a ˜β in the confidence region (41.1.1) (with the parameters adjusted
Trang 4to reflect the dimensionality of ˜u) which satisfies R ˜β = ˜u:
(41.1.2)
˜
u ∈ BRβ;π(y) ⇐⇒ exist ˜β with ˜u = R ˜β and (y− X ˜β)>(y− X ˜β)
Problem 416 Why does one have to change the value of c when one goes over
to the projections of the confidence regions?
Answer Because the projection is a many-to-one mapping, and vectors which are not in the original ellipsoid may still end up in the projection Again let us illustrate this with the 2-dimensional case in which the confidenceregion for β is an ellipse, as drawn in Figure 1, called Bβ;π(y) Starting with thisellipse, the above criterion defines individual confidence intervals for linear combina-tions u = r>β by the rule: ˜u ∈ Br> β;π(y) iff a ˜β ∈ Bβ(y) exists with r>β = ˜˜ u For
r = [1], this interval is simply the projection of the ellipse on the horizontal axis,and for r = [0] it is the projection on the vertical axis
The same argument applies for all vectors r with r>r = 1 The inner product
of two vectors is the length of the first vector times the length of the projection
of the second vector on the first If r>r = 1, therefore, r>β is simply the length˜
of the orthogonal projection of ˜β on the line generated by the vector r Therefore
Trang 5the confidence interval for r>β is simply the projection of the ellipse on the linegenerated by r (This projection is sometimes called the “shadow” of the ellipse.)
The confidence region for Rβ can also be defined as follows: ˜u lies in thisconfidence region if and only if the “best” βˆ which satisfies Rβˆ = ˜u lies in theconfidence region (41.1.1), this bestβˆbeing, of course, the constrained least squaresestimate subject to the constraint Rβ = ˜u, whose formula is given by (29.3.13).The confidence region for Rβ consists therefore of all ˜u for which the constrainedleast squares estimateβˆ=βˆ− (X>X)−1R> R(X>X)−1R>−1
(Rβˆ− ˜u) satisfiescondition (41.1.1):
Trang 6−2 −1 0 1 2
−5
−4
−3
−5
−4
−3
Figure 1 Confidence Ellipse with “Shadows”
In order to transform (41.1.3) into a mathematically more convenient form, write
it as
˜
u ∈ BRβ;π(y) ⇐⇒ (y− Xβˆ)>(y− Xβˆ) − (y− Xβˆ)>(y− Xβˆ)
Trang 7and then use (29.7.2) to get
Verify that this is the same as (41.1.5) in the special case R = I
Problem 418 You have run a regression with intercept, but you are not ested in the intercept per se but need a joint confidence region for all slope parameters.Using the notation of Problem 361, show that this confidence region has the form
inter-(41.1.7) β ∈ B˜ β;π(y) ⇐⇒ (
ˆ
β− ˜β)>X>X(βˆ− ˜β)
Trang 8I.e., we are sweeping the means out of both regressors and dependent variables, andthen we act as if the regression never had an intercept and use the formula for thefull parameter vector (41.1.6) for these transformed data (except that the number ofdegrees of freedom n−k still reflects the intercept as one of the explanatory variables).
Answer Write the full parameter vector as
α β
and R =o I Use (41.1.5) but instead
of ˜ u write ˜ β The only tricky part is the following which uses (30.0.37):
(41.1.8)
R(X>X)−1R>=o I
1/n + ¯ x > (X > X) −1 x ¯ −¯ x > (X > X) −1
denominator can be rewritten as ( y − Xβˆ)> ( y − Xβˆ
Problem 419 3 points We are in the simple regressionyt = α + βxt+εt Ifone draws, for every value of x, a 95% confidence interval for α + βx, one gets a
“confidence band” around the fitted line, as shown in Figure 2 Is the probabilitythat this confidence band covers the true regression line over its whole length equal
to 95%, greater than 95%, or smaller than 95%? Give a good verbal reasoning foryour answer You should make sure that your explanation is consistent with the factthat the confidence interval is random and the true regression line is fixed
Trang 9Figure 2 Confidence Band for Regression Line41.2 Coverage Probability of the Confidence Regions
The probability that any given known value ˜u lies in the confidence region(41.1.3) depends on the unknown β But we will show now that the “coverage prob-ability” of the region, i.e., the probability with which the confidence region containsthe unknown true value u = Rβ, does not depend on any unknown parameters
Trang 10To get the coverage probability, we must substitute ˜u = Rβ (where β is the trueparameter value) in (41.1.5) This gives
Furthermore, numerator and denominator are independent To see this, look first
at βˆand εˆ By Problem300 they are uncorrelated, and since they are also jointlyNormal, it follows that they are independent If βˆ and εˆare independent, anyfunctions of βˆ are independent of any functions of εˆ The numerator in the teststatistic (41.2.1) is a function ofβˆand the denominator is a function ofεˆ; therefore
Trang 11they are independent, as claimed Lastly, if we divide numerator by denominator,the unknown “nuisance parameter” σ2in their probability distributions cancels out,i.e., the distribution of the quotient is fully known.
To sum up: if ˜u is the true value ˜u = Rβ, then the test statistic in (41.2.1)can no longer be observed, but its distribution is is known; it is a χ2
i divided by anindependentχ2
n−k Therefore, for every value c, the probability that the confidenceregion (41.1.5) contains the true Rβ can be computed, and conversely, for any desiredcoverage probability, the appropriate critical value c can be computed As claimed,this critical value only depends on the confidence level π and n − k and i
41.3 Conventional Formulas for the Test Statistics
In order to get this test statistic into the form in which it is conventionallytabulated, we must divide both numerator and denominator of (41.1.5) by theirdegrees of freedom, to get a χ2
i/i divided by an independent χ2
n−k/(n − k) Thisquotient is called aF-distribution with i and n − k degrees of freedom
The F-distribution is defined as Fi,j = χ2i /i
Trang 12Therefore, instead of , the condition deciding whether a given vector ˜u lies in theconfidence region for Rβ with confidence level π = 1 − α is formulated as follows:(41.3.1)
Here the constrained SSE is theSSE in the model estimated with the constraint
Rβ = ˜u imposed, and F(i,n−k;α) is the upper α quantile of the F distributionwith i and n − k degrees of freedom, i.e., it is that scalar c for which a randomvariable F which has aF distribution with i and n − k degrees of freedom satisfiesPr[F≥ c] = α
41.4 Interpretation in terms of Studentized Mahalanobis DistanceThe division of numerator and denominator by their degrees of freedom also gives
us a second intuitive interpretation of the test statistic in terms of the Mahalanobisdistance, see chapter 40 If one divides the denominator by its degrees of freedom,one gets an unbiased estimate of σ2
n − k(y− Xβˆ)>(y− Xβˆ).
Trang 13Therefore from (41.1.5) one gets the following alternative formula for the joint fidence region B(y) for the vector parameter u = Rβ for confidence level π = 1 − α:(41.4.2)
con-˜
u ∈ BRβ;1−α(y) ⇐⇒ 1
s2(Rβˆ− ˜u)> R(X>X)−1R>−1
(Rβˆ− ˜u) ≤ iF(i,n−k;α)Hereβˆis the least squares estimator of β, ands2= (y− Xβˆ)>(y− Xβˆ)/(n − k) theunbiased estimator of σ2 Therefore ˆΣ =s2(X>X)−1 is the estimated covariancematrix as available in the regression printout Therefore ˆV = s2R(X>X)−1R>
is the estimate of the covariance matrix of Rβˆ Another way to write (41.4.2) istherefore
(41.4.3) B(y) = { ˜u ∈ Ri: (Rβˆ− ˜u)>Vˆ−1(Rβˆ− ˜u) ≤ iF(i,n−k;α)}
This formula allows a suggestive interpretation whether ˜u lies in the confidenceregion or not depends on the Mahalanobis distance of the actual value of Rβˆwouldhave from the distribution which Rβˆwould have if the true parameter vector were
to satisfy the constraint Rβ = ˜u It is not the Mahalanobis distance itself but only
an estimate of it because σ2 is replaced by its unbiased estimates2
These formulas are also useful for drawing the confidence ellipses The r whichyou need in equation (10.3.22) in order to draw the confidence ellipse is r =piF(i,n−k;α)
Trang 14This is the same as the local variable mult in the following S-function to draw thisellipse: its arguments are the center point (a 2-vector d), the estimated covariancematrix (a 2 × 2 matrix C), the degrees of freedom in the denominator of the F-distribution (the scalar df), and the confidence level (the scalar level between 0and 1 which defaults to 0.95 if not specified).
confelli
<-function(b, C, df, level = 0.95, xlab = "", ylab = "", add=T, prec=51)
# Plot an ellipse with "covariance matrix" C, center b, and P-content
# level according the F(2,df) distribution
# Sent to S-NEWS on May 19, 1999 by Roger Koenker
Trang 15d <- sqrt(diag(C))
dfvec <- c(2, df)
phase <- acos(C[1, 2]/(d[1] * d[2]))
angles <- seq( - (PI), PI, len = prec)
mult <- sqrt(dfvec[1] * qf(level, dfvec[1], dfvec[2]))
xpts <- b[1] + d[1] * mult * cos(angles)
ypts <- b[2] + d[2] * mult * cos(angles + phase)
if(add) lines(xpts, ypts)
else plot(xpts, ypts, type = "l", xlab = xlab, ylab = ylab)
}
The mathematics why this works is in Problem 166
Problem 420 3 points In the regression modely= Xβ +ε you observeyandthe (nonstochastic) X and you construct the following confidence region B(y) for
Rβ, where R is a i × k matrix with full row rank:
(41.4.4)
B(y) = {u ∈ Ri: (Rβˆ− u)>(R(X>X)−1R>)−1(Rβˆ− u) ≤ is2F(i,n−k;α)}.Compute the probability that B contains the true Rβ
Trang 17Finally,sr>βˆ is the estimated standard deviation of r>βˆ.
It is computed by the following three steps: First write down the variance of
Trang 18= Pr[
r>(X>X)−1X>εs
q
r>(X>X)−1r
(41.4.12)
= Pr[
r>(X>X)−1X>ε
σq
r>(X>X)−1r
.s
σ
...
Trang 21< /span>Three Principles for Testing a Linear Constraint
We work in the model y =... distinction between(1) and (2) explicit For instance [Chr87, p 29ff] distinguishes between “testinglinear parametric functions” and “testing models.” However the distinction betweenall principles... inadvertently and implicitly distinguish between (1) and (2)
as follows: they introduce the t-test for one parameter by principle (1), and the
F-test for several parameters by principle