1. Trang chủ
  2. » Tài Chính - Ngân Hàng

Class Notes in Statistics and Econometrics Part 26 pot

16 127 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 16
Dung lượng 336,27 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Now fractal dimensions: first the Hausdorff dimension as limε→0loglog1/εNε, indi-cating the exponent with which the number of covering piecesNε increases as the diameter of the pieces di

Trang 1

CHAPTER 51

Distinguishing Random Variables from Variables Created by a Deterministic Chaotic Process

Dynamical systems are either described as recursive functions (discrete time) or

as differential equations

With discrete time, recursive functions (recursive functions are difference equa-tions, discrete analog of differential equations), one can easily get chaotic behavior E.g., the tent map or logistic function

The problem is: how to distinguish the output of such a process from a randomly generated output

The same problem can also happen in the continuous case First-order differential equations can be visualized as vector fields

Trang 2

An attractor A is a compact set which has a neighborhood U such that A is the limit set of all trajectories starting in U That means, every trajectory starting in

U comes arbitrarily close to each point of the attractor

In R2, there are three different types of attractors: fixed points, limit cycles, and saddle loops But in R3 and higher, chaos can occur, i.e., the trajectory can have a

“strange attractor.” Example: Lorenz attractor

There is no commonly accepted definition of a strange attractor, it is an attractor that is neither a point nor a closed curve, and trajectories attracted by it take vastly different courses after a short time

Now fractal dimensions: first the Hausdorff dimension as limε→0loglog(1/ε)N(ε), indi-cating the exponent with which the number of covering piecesN(ε) increases as the diameter of the pieces diminishes

Examples with integer dimensions: for points we have N(ε) = 1 always, there-fore dimension is 0 For straight lines of length L, N(ε) = L/ε, therefore we get limε→0log(L/ε)log(1/ε) = 1, and for an area with surface S it is limε→0log(S/ε

2 ) log(1/ε) = 2 Famous example of set with fractal dimension is the Cantor set: start with unit interval, take middle third out, then take middle third of the two remaining segments out, etc For ε = 1/3 one gets N(ε) = 2, for ε = 1/9 one getsN(ε) = 4,

Trang 3

and generally, for ε = (1/3)m one gets N(ε) = 2m Therefore the dimension is limm→∞ log 2log 3mm = log 2log 3 = 0.63

A concept related to the Hausdorff dimension is the correlation dimension To compute this one needs C(ε), the fraction of the total number of points that are within the Euclidian distance ε of a given point (This C(ε) is a quotient of two infinite numbers, but in finite samples it is a quotient of two large but finite numbers, this is why it is more tractable than the Hausdorff dimension.) Example again with straight line and area, using sup norm: line: C(ε) = 2ε/L, area: C(ε) = 4ε2/S Then the correlation dimension is limε→0log C(ε)log ε , again indicating how this count varies with the distance

To compute it, use log CM(ε), which is the sample analog of log C(ε) for a sample

of size M , and plot it against log ε To get this sample analog, look at all pairs of different points, and count those which are less than ε apart, and divide by total number of pairs of different pointsN(N − 1)/2

Clearly, if ε is too small, it falls through between the points, and if it is too large,

it extends beyond the boundaries of the set Therefore one cannot look at the slope

in the origin but must look at the slope of a straight line segment near the origin Another reason for not looking at too small ε is that there may be a measurement error.)

Trang 4

It seems the correlation dimension is close to and cannot exceed the Hausdorff dimension What one really wants is apparently the Hausdorff dimension, but the correlation dimension is a numerically convenient surrogate

Importance of fractal dimensions: If an attractor has a fractal dimension, then

it is likely to be a strange attractor (although strictly speaking it is neither necessary nor sufficient) E.g it seems to me the precise Hausdorff dimension of the Lorentz attractor is not known, but the correlation dimension is around 2.05

51.1 Empirical Methods: Grassberger-Procaccia Plots

With conventional statistical means, it is hard to distinguish chaotic determin-istic from random timeseries In a timeseries generated by a tent map, one obtains for almost all initial conditions a time series whose the autocorrelation function is zero for all lags We need sophisticated results from chaos theory to be able to tell them apart

Here is the first such result: Assume there is a time series of n-dimensional vectors xt having followed a deterministic chaotic motion for a long time, so that for all practical purposes it has arrived at its strange attractor, but at every time point t you only observe the jth componentxj t Then an embedding of dimension

m is an artificial dynamical system formed by the m-histories of this jth component Takens proved that if xt lies on a strange attractor, and the embedding dimension

Trang 5

m > 2n − 1 then the embedding is topologically equivalent to the original time series.

In particular this means that it has the same correlation dimension

This has important implications: if a time series is part of a deterministic system also including other time series, then one can draw certain conclusions about the attractor without knowing the other time series

Next point: the correlation dimension of this embedding is limε→0log C(ε,m)log ε , where the embedding dimension m is added as second argument into the function C

If the system is deterministic, the correlation dimension settles to a stationary value

as the embedding dimension m increases; for a random system it keeps increasing, in the i.i.d case it is m (In the special case that this i.i.d distribution is the uniform one, the m-histories are uniformly distributed on the m-dimensional unit cube, and it follows immediately, like our examples above.) Therefore the Grassberger-Procaccia plots show for each m one curve, plotting log C(ε, m) against log ε

For ε small, i.e., log ε going towards −∞, the plots of the true C’s become asymptotically a straight line emanating from the origin with a given slope which indicates the dimension Now one cannot make ε very small for two reasons: (1) there are only finitely many data points, and (2) there is also a measurement error whose effect disappears if ε becomes bigger than a few standard deviations of this measurement error Therefore one looks at the slope for values of ε that are not too small

Trang 6

One method to see whether there is a deterministic structure is to compare this sample correlation dimension with that of “scrambled” data and see whether the slopes of the original data do not become steeper while those of the scrambled data still become steeper Scrambling means: fit an autocorrelation and then randomly draw the residuals

This is a powerful tool for distinguishing random noise from a deterministic system

Trang 7

CHAPTER 52

Instrumental Variables

Compare here [DM93, chapter 7] and [Gre97, Section 6.7.8] Greene first intro-duces the simple instrumental variables estimator and then shows that the general-ized one picks out the best linear combinations for forming simple instruments I will follow [DM93] and first introduce the generalized instrumental variables estimator, and then go down to the simple one

In this chapter, we will discuss a sequence of models yn = Xnβ +εn, where

εn ∼ (on, σ2In), andXn are n × k-matrices of random regressors, and the number

of observations n → ∞ We do not make the assumption plim1

nX>

nεn = o which would ensure consistency of the OLS estimator (compare Problem394) Instead, a sequence of n × m matrices of (random or nonrandom) “instrumental variables”Wn

Trang 8

is available which satisfies the following three conditions:

plim1

nW

>

nεn = o (52.0.1)

plim1

nW

>

nWn = Q exists, is nonrandom and nonsingular (52.0.2)

plim1

nW

>

nXn = D exists, is nonrandom and has full column rank (52.0.3)

Full column rank in (52.0.3) is only possible if m ≥ k

In this situation, regression of y on X is inconsistent But if one regresses y

on the projection ofX on R[W], the column space ofW, one obtains a consistent estimator This is called the instrumental variables estimator

If xi is the ith column vector of X, then W(W>W)−1W>xi is the pro-jection of xi on the space spanned by the columns of W Therefore the matrix

W(W>W)−1W>X consists of the columns ofX projected on R[W] This is what

we meant by the projection of X on R[W] With these projections as regressors, the vector of regression coefficients becomes the “generalized instrumental variables estimator”

(52.0.4) β˜=X>W(W>W)−1W>X

−1

X>W(W>W)−1W>y

Trang 9

Problem 460 3 points We are in the model y = Xβ +ε and we have a matrixW of “instrumental variables” which satisfies the following three conditions: plim1

nW>ε= o, plim1

nW>W = Q exists, is nonrandom and positive definite, and plim1nW>X = D exists, is nonrandom and has full column rank Show that the instrumental variables estimator

(52.0.5) β˜=X>W(W>W)−1W>X

−1

X>W(W>W)−1W>y

is consistent Hint: Write β˜n− β = Bn· 1

nW>ε and show that the sequence of matrices Bn has a plim

Answer Write it as

˜

β n =



X>W ( W>W )−1W>X

−1

X>W ( W>W )−1W>( X β + ε )

= β +



X>W ( W>W )−1W>X

−1

X>W ( W>W )−1W>ε

= β +

 (1

nX

>

W )(1

nW

>

W )−1(1

nW

>

X )

−1

(1

nX

>

W )(1

nW

>

W )−11

nW

>

ε ,

Trang 10

i.e., the B n and B of the hint are as follows:

B n =



(1

nX

>

W )(1

nW

>

W )−1(1

nW

>

X )

−1

(1

nX

>

W )(1

nW

>

W )−1

B = plim B n = (D>Q−1D)−1D>Q−1

 Problem 461 Assume plim1nX>X exists, and plimn1X>ε exists (We only need the existence, not that the first is nonsingular and the second zero) Show that

σ2 can be estimated consistently by s2=n1(y−X ˜β)>(y−X ˜β)

Answer y − X ˜ β = X β + ε − X ˜ β = ε − X ( β˜− β) Therefore

1

n(y−X ˜β)

> ( y − X ˜ β ) = 1

> ε −2

> X ( β˜− β) + ( β˜− β)>

1

nX

> X

 ( β˜− β) All summands have plims, the plim of the first is σ 2 and those of the other two are zero.

 Problem 462 In the situation of Problem 460, add the stronger assumption

1

nW>ε→N(o, σ2Q), and show that√

n(β˜n− β) →N(o, σ2(D>Q−1D)−1)

Answer. β˜n− β =Bn1

n W >

n ε n , therefore√n( β˜n − β) = B n n −1/2 W >

n ε n → B N (o, σ 2 Q) =

N (o, σ 2 BQB>) Since B = (D>Q−1D) −1 D>Q−1, the result follows 

Trang 11

From Problem 462 follows that for finite samples approximately β˜n − β ∼

N o,σn2(D>Q−1D)−1 Since 1

n(D>Q−1D)−1= (nD>(nQ)−1nD)−1, MSE [β˜; β] can be estimated bys2X>W(W>W)−1W>X

−1

The estimator (52.0.4) is sometimes called the two stages least squares estimate, because the projection of X on the column space of W can be considered the pre-dicted values if one regresses every column ofX onW I.e., instead of regressingy

onX one regressesy on those linear combinations of the columns ofW which best approximate the columns of X Here is more detail: the matrix of estimated coeffi-cients in the first regression is ˆΠ= (W>W)−1W>X, and the predicted values in this regression are ˆX=WΠˆ =W(W>W)−1W>X The second regression, which regressesy on ˆX, gives the coefficient vector

(52.0.6) β˜= ( ˆX>Xˆ)−1Xˆ>y

If you plug this in you see this is exactly (52.0.4) again

Now let’s look at the geometry of instrumental variable regression of one variable

yon one other variablexwithwas an instrument The specification isy=xβ +ε

On p.851we visualized the asymptotic results ifεis asymptotically orthogonal tox Now let us assumeεis asymptotically not orthogonal tox One can visualize this as three vectors, again normalized by dividing by√

n, but now even in the asymptotic

Trang 12

case theε-vector is not orthogonal tox (Drawεvertically, and makexlong enough that β < 1.) We assume n is large enough so that the asymptotic results hold for the sample already (or, perhaps better, that the difference between the sample and its plim is only infinitesimal) Therefore the OLS regression, with estimates β by

x>y/x>x, is inconsistent Let O be the origin, A the point on thex-vector where

ε branches off (i.e., the end ofxβ), furthermore let B be the point on the x-vector where the orthogonal projection of y comes down, and C the end of the x-vector Thenx>y= ¯OC ¯OB andx>x= ¯OC2, thereforex>y/x>x= ¯OB/ ¯OC, which would

be the β if the errors were orthogonal Now introduce a new variable w which is orthogonal to the errors (Sinceεis vertical,wis on the horizontal axis.) Call D the projection ofyonw, which is the prolongation of the vectorε, and call E the end of thew-vector, and call F the projection ofxonw Thenw>y= ¯OE ¯OD, andw>x=

¯

OE ¯OF Thereforew>y/w>x = ( ¯OE ¯OD)( ¯OE ¯OF ) = OD/ ¯¯ OF = OA/ ¯¯ OC = β

Or geometrically it is obvious that the regression of y on the projection ofx onw

will give the right βˆ One also sees here why thes2 based on this second regression

is inconsistent

If I allow two instruments, the two instruments must be in the horizontal plane perpendicular to the vectorε which is assumed still vertical Here we projectxon this horizontal plane and then regress the y, which stays where it is, on thisx In this way the residuals have the right direction!

Trang 13

What if there is one instrument, but it does not not lie in the same plane as

x and y? This is the most general case as long as there is only one regressor and one instrument This instrumentw must lie somewhere in the horizontal plane We have to project xon it, and then regressy on this projection Look at it this way: take the plane orthogonal to w which goes through point C The projection of x

on w is the intersection of the ray generated byw with this plane Now move this plane parallel until it intersects point A Then the intersection with thew-ray is the projection of y onw But this latter plane containsε, sinceε is orthogonal to w This makes sure that the regression gives the right results

Problem 463 4 points The asymptotic MSE matrix of the instrumental vari-ables estimator withW as matrix of instruments is σ2plimX>W(W>W)−1W>X

−1

Show that if one adds more instruments, then this asymptotic MSE -matrix can only decrease It is sufficient to show that the inequality holds before going over to the plim, i.e., ifW =

U V, then

(52.0.7) X>U(U>U)−1U>X

−1

−X>W(W>W)−1W>X

−1

is nonnegative definite Hints: (1) Use theorem A.5.5 in the Appendix (proof is not required) (2) Note that U = WG for some G Can you write this G in

Trang 14

partitioned matrix form? (3) Show that, whatever W and G, W(W>W)−1W>−

WG(G>W>WG)−1G>W> is idempotent

Answer.

(52.0.8) U =U V

 I O



= W G where G =

 I O

 .



Problem464 2 points Show: if a matrix D has full column rank and is square, then it has an inverse

Answer Here you need that column rank is row rank: if D has full column rank it also has full row rank And to make the proof complete you need: if A has a left inverse L and a right inverse R, then L is the only left inverse and R the only right inverse and L = R Proof:

Problem 465 2 points IfW>X is square and has full column rank, then it is nonsingular Show that in this case (52.0.4) simplifies to the “simple” instrumental variables estimator:

(52.0.9) β˜= (W>X)−1W>y

Ngày đăng: 04/07/2014, 15:20

TỪ KHÓA LIÊN QUAN