1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 6 pot

45 294 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề The Regression Fallacy
Trường học University of Utah
Chuyên ngành Statistics and Econometrics
Thể loại Lecture Notes
Thành phố Salt Lake City
Định dạng
Số trang 45
Dung lượng 499,23 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

If one picks atrandom a student entering the U of U, the intelligence of this student is a randomvariable which we assume to be normally distributed with mean µ and standarddeviation σ..

Trang 1

The Regression Fallacy

Only for the sake of this exercise we will assume that “intelligence” is an innateproperty of individuals and can be represented by a real number z If one picks atrandom a student entering the U of U, the intelligence of this student is a randomvariable which we assume to be normally distributed with mean µ and standarddeviation σ Also assume every student has to take two intelligence tests, the first

at the beginning of his or her studies, the other half a year later The outcomes ofthese tests are xand y xandy measure the intelligencez(which is assumed to bethe same in both tests) plus a random error εandδ, i.e.,

x=z+ε

(11.0.14)

y=z+δ

(11.0.15)

Trang 2

Here z∼N(µ, τ2),ε∼N(0, σ2), andδ∼N(0, σ2) (i.e., we assume that both errorshave the same variance) The three variables ε, δ, and z are independent of eachother Therefore xand y are jointly normal var[x] = τ2+ σ2, var[y] = τ2+ σ2,cov[x,y] = cov[z+ε,z+δ] = τ2+ 0 + 0 + 0 = τ2 Therefore ρ = τ2τ+σ2 2 The contourlines of the joint density are ellipses with center (µ, µ) whose main axes are the lines

y = x and y = −x in the x, y-plane

Now what is the conditional mean? Since var[x] = var[y], (10.3.17) gives theline E[y|x=x] = µ + ρ(x − µ), i.e., it is a line which goes through the center of theellipses but which is flatter than the line x = y representing the real underlying linearrelationship if there are no errors Geometrically one can get it as the line whichintersects each ellipse exactly where the ellipse is vertical

Therefore, the parameters of the best prediction of y on the basis of xare notthe parameters of the underlying relationship Why not? Because not only y butalso xis subject to errors Assume you pick an individual by random, and it turnsout that his or her first test result is very much higher than the average Then it ismore likely that this is an individual which was lucky in the first exam, and his orher true IQ is lower than the one measured, than that the individual is an Einsteinwho had a bad day This is simply becausezis normally distributed, i.e., among thestudents entering a given University, there are more individuals with lower IQ’s thanEinsteins In order to make a good prediction of the result of the second test one

Trang 3

must make allowance for the fact that the individual’s IQ is most likely lower thanhis first score indicated, therefore one will predict the second score to be lower thanthe first score The converse is true for individuals who scored lower than average,i.e., in your prediction you will do as if a “regression towards the mean” had takenplace.

The next important point to note here is: the “true regression line,” i.e., theprediction line, is uniquely determined by the joint distribution ofxandy Howeverthe line representing the underlying relationship can only be determined if one hasinformation in addition to the joint density, i.e., in addition to the observations.E.g., assume the two tests have different standard deviations, which may be the casesimply because the second test has more questions and is therefore more accurate.Then the underlying 45◦ line is no longer one of the main axes of the ellipse! To bemore precise, the underlying line can only be identified if one knows the ratio of thevariances, or if one knows one of the two variances Without any knowledge of thevariances, the only thing one can say about the underlying line is that it lies betweenthe line predictingy on the basis ofxand the line predictingxon the basis ofy

The name “regression” stems from a confusion between the prediction line andthe real underlying relationship Francis Galton, the cousin of the famous Darwin,measured the height of fathers and sons, and concluded from his evidence that theheights of sons tended to be closer to the average height than the height of the

Trang 4

fathers, a purported law of “regression towards the mean.” Problem 180 illustratesthis:

of the semester, one at the end, gives the following disturbing outcome: While theunderlying intelligence during the first test was z ∼N(100, 20), it changed betweenthe first and second test due to the learning experience at the university If wis theintelligence of each student at the second test, it is connected to his intelligence z

at the first test by the formula w= 0.5z+ 50, i.e., those students with intelligencebelow 100 gained, but those students with intelligence above 100 lost (The errors

of both intelligence tests are normally distributed with expected value zero, and thevariance of the first intelligence test was 5, and that of the second test, which hadmore questions, was 4 As usual, the errors are independent of each other and of theactual intelligence.)

• a 3 points If x and y are the outcomes of the first and second intelligencetest, compute E[x], E[y], var[x], var[y], and the correlation coefficient ρ = corr[x,y].Figure1shows an equi-density line of their joint distribution; 95% of the probabilitymass of the test results are inside this ellipse Draw the line w = 0.5z+ 50 intoFigure1

Trang 5

cov[x,y] = 10; corr[x, y] = 10/15 = 2/3 In matrix notation

(11.0.16)



x y





100 100

• b 4 points Compute E[y|x=x] and E[x|y=y] The first is a linear function of

x and the second a linear function of y Draw the two lines representing these linearfunctions into Figure1 Use (10.3.18) for this

intersects the ellipse where it is vertical The line x = E[x|y=y] goes through the points (80, 82) and

The two lines intersect in the center of the ellipse, i.e., at the point (100, 100).



• c 2 points Another researcher says that w = 106z+ 40, z ∼ N(100,1006 ),

ε∼N(0,50), δ∼N(0, 3) Is this compatible with the data?

Trang 6

Answer Yes, it is compatible: E[x] = E[z]+E[ε] = 100; E[y] = E[w]+E[δ] = 106100+40 = 100;

• d 4 points A third researcher asserts that the IQ of the students really did notchange He saysw=z,z∼N(100, 5),ε∼N(0, 20),δ∼N(0, 4) Is this compatiblewith the data? Is there unambiguous evidence in the data that the IQ declined?

E[x] = E[z] + E[ε] = 100; E[y] = E[z] + E[δ] = 100; var[x] = 5 + 20 = 25; var[y] = 5 + 4 = 9; cov[x,y] = 5 A scenario in which both tests have same underlying intelligence cannot be found Since the two conditional expectations are on the same side of the diagonal, the hypothesis that the intelligence did not change between the two tests is not consistent with the joint distribution

We just showed that the parameters of the true underlying relationship cannot

be inferred from the data alone if there are errors in both variables We also showedthat this lack of identification is not complete, because one can specify an intervalwhich in the plim contains the true parameter value

Chapter 53 has a much more detailed discussion of all this There we will seethat this lack of identification can be removed if more information is available, i.e., ifone knows that the two error variances are equal, or if one knows that the regression

Trang 7

has zero intercept, etc Question 181 shows that in this latter case, the OLS estimate

is not consistent, but other estimates exist that are consistent

hypothesis, drawing at random families in a given country and asking them abouttheir incomey and consumptionccan be modeled as the independent observations oftwo random variables which satisfy

y=yp+yt,(11.0.19)

c=cp+ct,(11.0.20)

cp= βyp.(11.0.21)

Hereyp andcp are the permanent andytandctthe transitory components of incomeand consumption These components are not observed separately, only their sums y

and c are observed We assume that the permanent income yp is random, withE[yp] = µ 6= 0 and var[yp] = τ2 The transitory components yt andct are assumed

to be independent of each other and of yp, and E[yt] = 0, var[yt] = σy2, E[ct] = 0,and var[ct] = σ2

c Finally, it is assumed that all variables are normally distributed

• a 2 points Given the above information, write down the vector of expected ues E[y

val-c] and the covariance matrix V[y

c] in terms of the five unknown parameters

of the model µ, β, τ2, σ2, and σ2

Trang 8



y c





y c

Trang 9

= (0.2)(12,000) + (0.8)(22,000) = 20,000.



• c 3 points To make things more concrete, assume the parameters are

β = 0.7(11.0.24)

σy= 2,000(11.0.25)

σc= 1,000(11.0.26)

µ = 12,000(11.0.27)

τy= 4,000

(11.0.28)

If a family’s income is y = 22,000, what is your best guess of this family’s permanentincomeyp? Give an intuitive explanation why this best guess is smaller than 22,000

Trang 10

• d 2 points If a family’s income is y, show that your best guess about thisfamily’s consumption is

2 y

τ2+ σ2µ + τ

2 y

τ2+ σ2y.Instead of an exact mathematical proof you may also reason out how it can be obtainedfrom (11.0.23) Give the numbers for a family whose actual income is 22,000

transitory consumption is uncorrelated with everything else and therefore must be predicted by 0 This is an acceptable answer, but one can also derive it from scratch:

Trang 11

The remainder of this Problem uses material that comes later in these Notes:

• e 4 points From now on we will assume that the true values of the parametersare not known, but two vectors y and c of independent observations are available

We will show that it is not correct in this situation to estimate β by regressing con

y with the intercept suppressed This would give the estimator

P

ciyi

Py2 i

Show that the plim of this estimator is

E[y2]Which theorems do you need for this proof ? Show thatβˆis an inconsistent estimator

of β, which yields too small values for β

has a plim: by the weak law of large numbers the plim of the average is the expected value, therefore

we have to divide both numerator and denominator by n Then we can use the Slutsky theorem that the plim of the fraction is the fraction of the plims.

Trang 12

• f 4 points Give the formulas of the method of moments estimators of the fiveparamaters of this model: µ, β, τ2, σ2, and σ2 (For this you have to express thesefive parameters in terms of the five moments E[y], E[c], var[y], var[c], and cov[y,c],and then simply replace the population moments by the sample moments.) Are theseconsistent estimators?

• g 4 points Now assume you are not interested in estimating β itself, but inaddition to the two n-vectorsyandcyou have an observation of yn+1 and you want

to predict the corresponding cn+1 One obvious way to do this would be to plug themethod-of moments estimators of the unknown parameters into formula (11.0.29)for the best linear predictor Show that this is equivalent to using the ordinary leastsquares predictorc∗=αˆ+βyˆ whereαˆ andβˆare intercept and slope in the simple

Trang 13

Note that we are regressing c on y with an intercept, although the original modeldoes not have an intercept.



• h 2 points What is the “Iron Law of Econometrics,” and how does the aboverelate to it?

Trang 14

Answer The Iron Law says that all effects are underestimated because of errors in the pendent variable Friedman says Keynesians obtain their low marginal propensity to consume due

inde-to the “Iron Law of Econometrics”: they ignore that actual income is a measurement with error of

closely than [HVdP02] does Sargent and Wallace first reproduce the usual argumentwhy “activist” policy rules, in which the Fed “looks at many things” and “leansagainst the wind,” are superior to policy rules without feedback as promoted by themonetarists

They work with a very stylized model in which national income is represented bythe following time series:

(11.0.38) yt= α + λyt−1+ βmt+ut

Hereytis GNP, measured as its deviation from “potential” GNP or as unemploymentrate, and mt is the rate of growth of the money supply The random disturbance ut

is assumed independent of yt−1, it has zero expected value, and its variance var[ut]

is constant over time, we will call it var[u] (no time subscript)

• a 4 points First assume that the Fed tries to maintain a constant moneysupply, i.e., mt = g0+εt where g0 is a constant, and εt is a random disturbancesince the Fed does not have full control over the money supply The ε have zero

Trang 15

expected value; they are serially uncorrelated, and they are independent of the ut.This constant money supply rule does not necessarily make yt a stationary timeseries (i.e., a time series where mean, variance, and covariances do not depend ont), but if |λ| < 1 then yt converges towards a stationary time series, i.e., any initialdeviations from the “steady state” die out over time You are not required here toprove that the time series converges towards a stationary time series, but you areasked to compute E[yt] in this stationary time series.

• b 8 points Now assume the policy makers want to steer the economy towards

a desired steady state, call it y∗, which they think makes the best tradeoff betweenunemployment and inflation, by setting mtaccording to a rule with feedback:(11.0.39) mt= g0+ g1yt−1+εt

Show that the following values of g0 and g1

Trang 16

without feedback Sargent and Wallace argue that there is a flaw in this reasoning.Which flaw?

• d 5 points A possible system of structural equations from which (11.0.38) can

be derived are equations (11.0.41)–(11.0.43) below Equation (11.0.41) indicates thatunanticipated increases in the growth rate of the money supply increase output, whileanticipated ones do not This is a typical assumption of the rational expectationsschool (Lucas supply curve)

Trang 17

certain period during which a constant policy rule g0, g1 is followed, the tricians regress yton yt−1 andmt in order to estimate the coefficients in (11.0.38).Which values of α, λ, and β will such a regression yield?

Trang 18

econome-80 90 100 110 120

90

100

110

90 100

110

Figure 1 Ellipse containing 95% of the probability mass of test results x and y

Trang 19

A Simple Example of Estimation

We will discuss here a simple estimation problem, which can be considered theprototype of all least squares estimation Assume we have n independent observations

y1, , yn of a Normally distributed random variable y ∼ N(µ, σ2) with unknownlocation parameter µ and dispersion parameter σ2 Our goal is to estimate thelocation parameter and also estimate some measure of the precision of this estimator

12.1 Sample Mean as Estimator of the Location ParameterThe obvious (and in many cases also the best) estimate of the location parameter

of a distribution is the sample mean ¯y = 1

n

Pn i=1yi Why is this a reasonableestimate?

Trang 20

1 The location parameter of the Normal distribution is its expected value, and

by the weak law of large numbers, the probability limit for n → ∞ of the samplemean is the expected value

2 The expected value µ is sometimes called the “population mean,” while ¯y isthe sample mean This terminology indicates that there is a correspondence betweenpopulation quantities and sample quantities, which is often used for estimation This

is the principle of estimating the unknown distribution of the population by theempirical distribution of the sample Compare Problem63

3 This estimator is also unbiased By definition, an estimatortof the parameter

θ is unbiased if E[t] = θ ¯y is an unbiased estimator of µ, since E[¯y] = µ

4 Given n observations y1, , yn, the sample mean is the number a = ¯y whichminimizes (y1− a)2+ (y2− a)2+ · · · + (yn− a)2 One can say it is the number whosesquared distance to the given sample numbers is smallest This idea is generalized

in the least squares principle of estimation It follows from the following frequentlyused fact:

5 In the case of normality the sample mean is also the maximum likelihoodestimate

Trang 21

Problem 183 4 points Let y1, , yn be an arbitrary vector and α an arbitrarynumber As usual, ¯y =n1Pn

i=1yi Show that(12.1.1)

nXi=1

(12.1.2)

=

nXi=1

nXi=1

nXi=1

(12.1.3)

=

nXi=1

nXi=1

(12.1.4)

Since the middle term is zero, (12.1.1) follows.



of a random variable y, but it does not matter how the y were obtained.) Prove that

Trang 22

the scalar α which minimizes the sum

(12.1.5) (y1− α)2+ (y2− α)2+ · · · + (yn− α)2=X(yi− α)2

is the arithmetic mean α = ¯y

not a good estimate of the location parameter Which other estimate (or estimates)would be preferable in that situation?

12.2 Intuition of the Maximum Likelihood Estimator

In order to make intuitively clear what is involved in maximum likelihood mation, look at the simplest case y = µ +ε, ε∼ N(0, 1), where µ is an unknownparameter In other words: we know that one of the functions shown in Figure 1 isthe density function ofy, but we do not know which:

esti-Assume we have only one observation y What is then the MLE of µ? It is that

˜

µ for which the value of the likelihood function, evaluated at y, is greatest I.e., youlook at all possible density functions and pick the one which is highest at point y,and use the µ which belongs this density as your estimate

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN