1. Trang chủ
  2. » Kinh Tế - Quản Lý

Lecture notes for econometrics

241 403 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 241
Dung lượng 1,17 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

36 3 The Distribution of a Sample Average 44 3.1 Variance of a Sample Average... Distribution of √T times sample average √T times sample average T=5 T=25 T=50 T=100 Figure 1.1: Sampling

Trang 1

Lecture Notes for Econometrics 2002 (first year

PhD course in Stockholm)

Paul Söderlind1

June 2002 (some typos corrected and some material added later)

1University of St Gallen Address: s/bf-HSG, Rosenbergstrasse 52, CH-9000 St Gallen,Switzerland E-mail: Paul.Soderlind@unisg.ch Document name: EcmAll.TeX

Trang 2

1.1 Means and Standard Deviation 5

1.2 Testing Sample Means 6

1.3 Covariance and Correlation 8

1.4 Least Squares 10

1.5 Maximum Likelihood 11

1.6 The Distribution of Oˇ 12

1.7 Diagnostic Tests 14

1.8 Testing Hypotheses about Oˇ 14

A Practical Matters 16 B A CLT in Action 17 2 Univariate Time Series Analysis 21 2.1 Theoretical Background to Time Series Processes 21

2.2 Estimation of Autocovariances 22

2.3 White Noise 25

2.4 Moving Average 25

2.5 Autoregression 28

2.6 ARMA Models 35

2.7 Non-stationary Processes 36

3 The Distribution of a Sample Average 44 3.1 Variance of a Sample Average 44

3.2 The Newey-West Estimator 48

Trang 3

3.3 Summary 50

4 Least Squares 53 4.1 Definition of the LS Estimator 53

4.2 LS and R2  55

4.3 Finite Sample Properties of LS 57

4.4 Consistency of LS 58

4.5 Asymptotic Normality of LS 60

4.6 Inference 63

4.7 Diagnostic Tests of Autocorrelation, Heteroskedasticity, and Normality 66 5 Instrumental Variable Method 74 5.1 Consistency of Least Squares or Not? 74

5.2 Reason 1 for IV: Measurement Errors 74

5.3 Reason 2 for IV: Simultaneous Equations Bias (and Inconsistency) 76

5.4 Definition of the IV Estimator—Consistency of IV 80

5.5 Hausman’s Specification Test 86

5.6 Tests of Overidentifying Restrictions in 2SLS 87

6 Simulating the Finite Sample Properties 89 6.1 Monte Carlo Simulations in the Simplest Case 89

6.2 Monte Carlo Simulations in More Complicated Cases 91

6.3 Bootstrapping in the Simplest Case 93

6.4 Bootstrapping in More Complicated Cases 93

7 GMM 97 7.1 Method of Moments 97

7.2 Generalized Method of Moments 98

7.3 Moment Conditions in GMM 98

7.4 The Optimization Problem in GMM 101

7.5 Asymptotic Properties of GMM 105

7.6 Summary of GMM 110

7.7 Efficient GMM and Its Feasible Implementation 111

7.8 Testing in GMM 112

Trang 4

7.9 GMM with Sub-Optimal Weighting Matrix 114

7.10 GMM without a Loss Function 115

7.11 Simulated Moments Estimator 116

8 Examples and Applications of GMM 119 8.1 GMM and Classical Econometrics: Examples 119

8.2 Identification of Systems of Simultaneous Equations 129

8.3 Testing for Autocorrelation 131

8.4 Estimating and Testing a Normal Distribution 135

8.5 Testing the Implications of an RBC Model 139

8.6 IV on a System of Equations 140

12 Vector Autoregression (VAR) 142 12.1 Canonical Form 142

12.2 Moving Average Form and Stability 143

12.3 Estimation 146

12.4 Granger Causality 146

12.5 Forecasts Forecast Error Variance 147

12.6 Forecast Error Variance Decompositions 148

12.7 Structural VARs 149

12.8 Cointegration and Identification via Long-Run Restrictions 159

12 Kalman filter 166 12.1 Conditional Expectations in a Multivariate Normal Distribution 166

12.2 Kalman Recursions 167

13 Outliers and Robust Estimators 173 13.1 Influential Observations and Standardized Residuals 173

13.2 Recursive Residuals 174

13.3 Robust Estimation 176

13.4 Multicollinearity 178

14 Generalized Least Squares 181 14.1 Introduction 181

14.2 GLS as Maximum Likelihood 182

Trang 5

14.3 GLS as a Transformed LS 185

14.4 Feasible GLS 185

15 Nonparametric Regressions and Tests 187 15.1 Nonparametric Regressions 187

15.2 Estimating and Testing Distributions 195

21 Some Statistics 202 21.1 Distributions and Moment Generating Functions 202

21.2 Joint and Conditional Distributions and Moments 204

21.3 Convergence in Probability, Mean Square, and Distribution 207

21.4 Laws of Large Numbers and Central Limit Theorems 209

21.5 Stationarity 210

21.6 Martingales 210

21.7 Special Distributions 211

21.8 Inference 221

22 Some Facts about Matrices 223 22.1 Rank 223

22.2 Vector Norms 223

22.3 Systems of Linear Equations and Matrix Inverses 223

22.4 Complex matrices 226

22.5 Eigenvalues and Eigenvectors 227

22.6 Special Forms of Matrices 227

22.7 Matrix Decompositions 229

22.8 Matrix Calculus 235

22.9 Miscellaneous 238

Trang 6

1 Introduction

The mean and variance of a series are estimated as

in-a moving in-averin-age—sometimes used in in-anin-alysis of finin-anciin-al prices.)

If xt is iid (independently and identically distributed), then it is straightforward to findthe variance of the sample average Then, note that

Trang 7

0 0.2

0.4

b Distribution of √T times sample average

T times sample average

T=5 T=25 T=50 T=100

Figure 1.1: Sampling distributions This figure shows the distribution of the sample meanand ofp

T times the sample mean of the random variable zt 1 where zt  2.1/

the second equality follows from the assumption of identical distributions which impliesidentical expectations

The law of large numbers (LLN) says that the sample mean converges to the true tion mean as the sample size goes to infinity This holds for a very large class of randomvariables, but there are exceptions A sufficient (but not necessary) condition for this con-vergence is that the sample average is unbiased (as in (1.3)) and that the variance goes tozero as the sample size goes to infinity (as in (1.2)) (This is also called convergence inmean square.) To see the LLN in action, see Figure 1.1

popula-The central limit theorem (CLT) says thatp

T Nx converges in distribution to a normaldistribution as the sample size increases See Figure 1.1 for an illustration This alsoholds for a large class of random variables—and it is a very useful result since it allows

us to test hypothesis Most estimators (including LS and other methods) are effectivelysome kind of sample average, so the CLT can be applied

The basic approach in testing a hypothesis (the “null hypothesis”), is to compare thetest statistics (the sample average, say) with how the distribution of that statistics (which

is a random number since the sample is finite) would look like if the null hypothesis istrue For instance, suppose the null hypothesis is that the population mean is  Supposealso that we know that distribution of the sample mean is normal with a known variance

h2(which will typically be estimated and then treated as if it was known) Under the nullhypothesis, the sample average should then be N.; h2/ We would then reject the null

Trang 8

hypothesis if the sample average is far out in one the tails of the distribution A traditionaltwo-tailed test amounts to rejecting the null hypothesis at the 10% significance level ifthe test statistics is so far out that there is only 5% probability mass further out in thattail (and another 5% in the other tail) The interpretation is that if the null hypothesis isactually true, then there would only be a 10% chance of getting such an extreme (positive

or negative) sample average—and these 10% are considered so low that we say that thenull is probably wrong

Density function of N(0,2)

y = x−0.5 Pr(y ≤ −2.33) = 0.05

Figure 1.2: Density function of normal distribution with shaded 5% tails

See Figure 1.2 for some examples or normal distributions recall that in a normaldistribution, the interval ˙1 standard deviation around the mean contains 68% of theprobability mass; ˙1:65 standard deviations contains 90%; and ˙2 standard deviationscontains 95%

In practice, the test of a sample mean is done by “standardizing” the sampe mean so

Trang 9

that it can be compared with a standard N.0; 1/ distribution The logic of this is as follows

calcu-To construct a two-tailed test, we also need.the probability that Nx is above some ber This number is chosen to make the two-tailed tst symmetric, that is, so that there

num-is as much probability mass below lower number (lower tail) as above the upper number(upper tail) With a normal distribution (or, for that matter, any symmetric distribution)this is done as follows Note that Nx /= h  N.0; 1/ is symmetric around 0 This meansthat the probability of being above some number, C /= h, must equal the probability

of being below 1 times the same number, or

is by looking at the normal cumulative distribution function—see Figure 1.2

The covariance of two variables (here x and y) is typically estimated as

bCov xt; zt/DPT

Note that this is a kind of sample average, so a CLT can be used

The correlation of two variables is then estimated as

bCorr xt; zt/D Cov xb t; zt/

c

where cStd.xt/ is an estimated standard deviation A correlation must be between 1 and 1

Trang 10

Pdf of t when true β =0.51 Probs: 0.05 0.05

t −1.65 and t > 1.65 (10% critical values)

Figure 1.3: Power of two-sided test

(try to show it) Note that covariance and correlation measure the degree of linear relationonly This is illustrated in Figure 1.4

The pth autocovariance of x is estimated by

bCov xt; xt p D PTt D1.xt Nx/ xt p Nx =T; (1.9)where we use the same estimated (using all data) mean in both places Similarly, the pthautocorrelationis estimated as

bCorr xt; xt p D Cov xb t; xt p

Trang 11

where all variables are zero mean scalars and where ˇ0 is the true value of the parameter

we want to estimate The task is to use a samplefyt; xtgTt D1 to estimate ˇ and to testhypotheses about its value, for instance that ˇD 0

If there were no movements in the unobserved errors, ut, in (1.11), then any samplewould provide us with a perfect estimate of ˇ With errors, any estimate of ˇ will stillleave us with some uncertainty about what the true value is The two perhaps most impor-tant issues in econometrics are how to construct a good estimator of ˇ and how to assess

Trang 12

the uncertainty about the true value.

For any possible estimate, Oˇ, we get a fitted residual

One appealing method of choosing Oˇ is to minimize the part of the movements in yt that

we cannot explain by xtˇ, that is, to minimize the movements inO Out There are severalcandidates for how to measure the “movements,” but the most common is by the mean ofsquared errors, that is, ˙t D1T Ou2t=T We will later look at estimators where we instead use

˙t D1T j Outj =T

With the sum or mean of squared errors as the loss function, the optimization problem

minˇ

1T

TX

t D1

has the first order condition that the derivative should be zero as the optimal estimate Oˇ

1T

TX

t D1

xt2

1T

TX

Trang 13

Since the errors are independent, we get the joint pdf of the u1; u2; : : : ; uT by multiplyingthe marginal pdfs of each of the errors Then substitute yt xtˇ for ut (the derivative ofthe transformation is unity) and take logs to get the log likelihood function of the sample

TX

t D1

This likelihood function is maximized by minimizing the last term, which is tional to the sum of squared errors - just like in (1.13): LS is ML when the errors are iidnormally distributed

propor-Maximum likelihood estimators have very nice properties, provided the basic tributional assumptions are correct If they are, then MLE are typically the most effi-cient/precise estimators, at least asymptotically ML also provides a coherent frameworkfor testing hypotheses (including the Wald, LM, and LR tests)

dis-1.6 The Distribution of O ˇ

Equation (1.15) will give different values of Oˇ when we use different samples, that isdifferent draws of the random variables ut, xt, and yt Since the true value, ˇ0, is a fixedconstant, this distribution describes the uncertainty we should have about the true valueafter having obtained a specific estimated value

To understand the distribution of Oˇ, use (1.11) in (1.15) to substitute for yt

O

T

TX

t D1

x2t

1T

TX

t D1

xt2

1T

TX

t D1

where ˇ0is the true value

The first conclusion from (1.19) is that, with ut D 0 the estimate would always beperfect — and with large movements in ut we will see large movements in Oˇ The secondconclusion is that not even a strong opinion about the distribution of ut, for instance that

ut is iid N 0; 2, is enough to tell us the whole story about the distribution of Oˇ Thereason is that deviations of Oˇ from ˇ0 are a function of xtut, not just of ut Of course,

Trang 14

when xt are a set of deterministic variables which will always be the same irrespective

of which sample we use, then Oˇ ˇ0 is a time invariant linear function of ut, so thedistribution of ut carries over to the distribution of Oˇ This is probably an unrealisticcase, which forces us to look elsewhere to understand the properties of Oˇ

There are two main routes to learn more about the distribution of Oˇ: (i) set up a small

“experiment” in the computer and simulate the distribution or (ii) use the asymptoticdistribution as an approximation The asymptotic distribution can often be derived, incontrast to the exact distribution in a sample of a given size If the actual sample is large,then the asymptotic distribution may be a good approximation

A law of large numbers would (in most cases) say that bothPT

t D1x2t=T andPT

t D1xtut=T

in (1.19) converge to their expected values as T ! 1 The reason is that both are sampleaverages of random variables (clearly, both x2t and xtut are random variables) These ex-pected values are Var.xt/ and Cov.xt; ut/, respectively (recall both xt and ut have zeromeans) The key to show that Oˇ is consistent, that is, has a probability limit equal to ˇ0, isthat Cov.xt; ut/ D 0 This highlights the importance of using good theory to derive notonly the systematic part of (1.11), but also in understanding the properties of the errors.For instance, when theory tells us that yt and xt affect each other (as prices and quanti-ties typically do), then the errors are likely to be correlated with the regressors - and LS

is inconsistent One common way to get around that is to use an instrumental variablestechnique More about that later Consistency is a feature we want from most estimators,since it says that we would at least get it right if we had enough data

Suppose that Oˇ is consistent Can we say anything more about the asymptotic bution? Well, the distribution of Oˇ converges to a spike with all the mass at ˇ0, but thedistribution of p

distri-T Oˇ, or p

T  Oˇ ˇ0

, will typically converge to a non-trivial normaldistribution To see why, note from (1.19) that we can write

t D1

xt2

TT

TX

t D1

The first term on the right hand side will typically converge to the inverse of Var.xt/, asdiscussed earlier The second term isp

T times a sample average (of the random variable

xtut) with a zero expected value, since we assumed that Oˇ is consistent Under weakconditions, a central limit theorem applies sop

T times a sample average converges to

a normal distribution This shows that p

T Oˇ has an asymptotic normal distribution It

Trang 15

turns out that this is a property of many estimators, basically because most estimators aresome kind of sample average For an example of a central limit theorem in action, seeAppendix B

Exactly what the variance ofp

T Oˇ ˇ0/ is, and how it should be estimated, dependsmostly on the properties of the errors This is one of the main reasons for diagnostic tests.The most common tests are for homoskedastic errors (equal variances of ut and ut s) and

no autocorrelation (no correlation of ut and ut s)

When ML is used, it is common to investigate if the fitted errors satisfy the basicassumptions, for instance, of normality

1.8 Testing Hypotheses about O ˇ

Suppose we now assume that the asymptotic distribution of Oˇ is such that

Trang 16

Figure 1.5: Probability density functions

The natural interpretation of a really large test statistics, jpT Oˇ=vj D 3 say, is that

it is very unlikely that this sample could have been drawn from a distribution where thehypothesis ˇ0D 0 is true We therefore choose to reject the hypothesis We also hope thatthe decision rule we use will indeed make us reject false hypothesis more often than wereject true hypothesis For instance, we want the decision rule discussed above to reject

ˇ0 D 0 more often when ˇ0 D 1 than when ˇ0 D 0

There is clearly nothing sacred about the 5% significance level It is just a matter ofconvention that the 5% and 10% are the most widely used However, it is not uncommon

to use the 1% or the 20% Clearly, the lower the significance level, the harder it is to reject

a null hypothesis At the 1% level it often turns out that almost no reasonable hypothesiscan be rejected

The t-test described above works only if the null hypothesis contains a single tion We have to use another approach whenever we want to test several restrictionsjointly The perhaps most common approach is a Wald test To illustrate the idea, suppose

restric-ˇ is an m 1 vector and thatpT Oˇ ! N 0; V / under the null hypothesis , where V is adcovariance matrix We then know that

Trang 17

A Practical Matters

 Gauss, MatLab, RATS, Eviews, Stata, PC-Give, Micro-Fit, TSP, SAS

 Software reviews in The Economic Journal and Journal of Applied EconometricsA.0.2 Useful Econometrics Literature

1 Greene (2000), Econometric Analysis (general)

2 Hayashi (2000), Econometrics (general)

3 Johnston and DiNardo (1997), Econometric Methods (general, fairly easy)

4 Pindyck and Rubinfeld (1998), Econometric Models and Economic Forecasts eral, easy)

(gen-5 Verbeek (2004), A Guide to Modern Econometrics (general, easy, good tions)

applica-6 Davidson and MacKinnon (1993), Estimation and Inference in Econometrics eral, a bit advanced)

(gen-7 Ruud (2000), Introduction to Classical Econometric Theory (general, consistentprojection approach, careful)

8 Davidson (2000), Econometric Theory (econometrics/time series, LSE approach)

9 Mittelhammer, Judge, and Miller (2000), Econometric Foundations (general, vanced)

ad-10 Patterson (2000), An Introduction to Applied Econometrics (econometrics/time ries, LSE approach with applications)

se-11 Judge et al (1985), Theory and Practice of Econometrics (general, a bit old)

12 Hamilton (1994), Time Series Analysis

Trang 18

13 Spanos (1986), Statistical Foundations of Econometric Modelling, Cambridge versity Press (general econometrics, LSE approach)

Uni-14 Harvey (1981), Time Series Models, Philip Allan

15 Harvey (1989), Forecasting, Structural Time Series (structural time series, Kalmanfilter)

16 Lütkepohl (1993), Introduction to Multiple Time Series Analysis (time series, VARmodels)

17 Priestley (1981), Spectral Analysis and Time Series (advanced time series)

18 Amemiya (1985), Advanced Econometrics, (asymptotic theory, non-linear metrics)

econo-19 Silverman (1986), Density Estimation for Statistics and Data Analysis (density timation)

es-20 Härdle (1990), Applied Nonparametric Regression

2.1/:) When zt is iid 2.1/, then ˙T

t D1zt is distributed as a 2.T / variable with pdf

fT./ We now construct a new variable by transforming ˙t D1T zt as to a sample meanaround one (the mean ofzt)

Nz1D ˙t D1T zt=T 1D ˙t D1T zt 1/ =T:

Clearly, the inverse function is˙t D1T zt D T Nz1C T , so by the “change of variable” rule

we get the pdf of Nz1as

g.Nz1/D fT TNz1C T / T:

Trang 19

Example B.3 Continuing the previous example, we now consider the random variable

Nz2 DpTNz1;with inverse function Nz1 D Nz2=p

T By applying the “change of variable” rule again, weget the pdf of Nz2 as

Example B.4 Whenzt is iid2.1/, then ˙t D1T zt is2.T /, which we denote f ˙t D1T zt/

We now construct two new variables by transforming˙t D1T zt

Nz1 D ˙t D1T zt=T 1D ˙t D1T zt 1/ =T , and

Nz2 DpT Nz1:Example B.5 We transform this distribution by first subtracting one fromzt (to removethe mean) and then by dividing byT orp

T This gives the distributions of the samplemean and scaled sample mean,Nz2 DpTNz1as

2T =2 T =2/y

T =2 1exp y=2/ with y D T Nz1C T , and

2T =2 T =2/y

T =2 1exp y=2/ with y DpTNz1C T

These distributions are shown in Figure 1.1 It is clear thatf Nz1/ converges to a spike

at zero as the sample size increases, while f Nz2/ converges to a (non-trivial) normaldistribution

Example B.6 (Distribution of ˙t D1T zt 1/ =T and p

We transform this distribution by first subtracting one fromzt (to remove the mean) andthen by dividing by T or p

T This gives the distributions of the sample mean, Nz1 D

Trang 20

˙t D1T zt 1/ =T , and scaled sample mean, Nz2DpT Nz1as

2T =2 T =2/y

T =2 1exp y=2/ with y D T Nz1C T , and

2T =2 T =2/y

T =2 1exp y=2/ with y DpTNz1C T

These distributions are shown in Figure 1.1 It is clear thatf Nz1/ converges to a spike

at zero as the sample size increases, while f Nz2/ converges to a (non-trivial) normaldistribution

Bibliography

Amemiya, T., 1985, Advanced econometrics, Harvard University Press, Cambridge, sachusetts

Mas-Davidson, J., 2000, Econometric theory, Blackwell Publishers, Oxford

Davidson, R., and J G MacKinnon, 1993, Estimation and inference in econometrics,Oxford University Press, Oxford

Greene, W H., 2000, Econometric analysis, Prentice-Hall, Upper Saddle River, NewJersey, 4th edn

Hamilton, J D., 1994, Time series analysis, Princeton University Press, Princeton

Härdle, W., 1990, Applied nonparametric regression, Cambridge University Press, bridge

Cam-Harvey, A C., 1989, Forecasting, structural time series models and the Kalman filter,Cambridge University Press

Hayashi, F., 2000, Econometrics, Princeton University Press

Johnston, J., and J DiNardo, 1997, Econometric methods, McGraw-Hill, New York, 4thedn

Lütkepohl, H., 1993, Introduction to multiple time series, Springer-Verlag, 2nd edn

Trang 21

Mittelhammer, R C., G J Judge, and D J Miller, 2000, Econometric foundations, bridge University Press, Cambridge.

Cam-Patterson, K., 2000, An introduction to applied econometrics: a time series approach,MacMillan Press, London

Pindyck, R S., and D L Rubinfeld, 1998, Econometric models and economic forecasts,Irwin McGraw-Hill, Boston, Massachusetts, 4ed edn

Priestley, M B., 1981, Spectral analysis and time series, Academic Press

Ruud, P A., 2000, An introduction to classical econometric theory, Oxford UniversityPress

Silverman, B W., 1986, Density estimation for statistics and data analysis, Chapman andHall, London

Verbeek, M., 2004, A guide to modern econometrics, Wiley, Chichester, 2nd edn

Trang 22

2 Univariate Time Series Analysis

Reference: Greene (2000) 13.1-3 and 18.1-3

Additional references: Hayashi (2000) 6.2-4; Verbeek (2004) 8-9; Hamilton (1994); ston and DiNardo (1997) 7; and Pindyck and Rubinfeld (1998) 16-18

Suppose we have a sample of T observations of a random variable

˚yi t T

t D1 D˚y1i; y2i; :::; yTi ;where subscripts indicate time periods The superscripts indicate that this sample is fromplanet (realization) i We could imagine a continuum of parallel planets where the sametime series process has generated different samples with T different numbers (differentrealizations)

Consider period t The distribution of yt across the (infinite number of) planets hassome density function, ft.yt/ The mean of this distribution

Now consider periods t and t s jointly On planet i we have the pair˚yi

t s; yti The bivariate distribution of these pairs, across the planets, has some density function

gt s;t.yt s; yt/.1 Calculate the covariance between yt s and yt as usual

1 The relation between ft.y t / and g t s;t y t s ; y t / is, as usual, f t y t / = R 1

1 g t s;t y t s ; y t / dy t s

Trang 23

This is the st hautocovariance of yt (Of course, s D 0 or s < 0 are allowed.)

A stochastic process is covariance stationary if

TX

This means that the link between the values in t and t s goes to zero sufficiently fast

as s increases (you may think of this as getting independent observations before we reachthe limit) If yt is normally distributed, then (2.8) is also sufficient for the process to beergodic for all moments, not just the mean Figure 2.1 illustrates how a longer and longersample (of one realization of the same time series process) gets closer and closer to theunconditional distribution as the sample gets longer

Let yt be a vector of a covariance stationary and ergodic The sth covariance matrix is

Trang 24

sample length

Mean Std

Figure 2.1: Sample of one realization of yt D 0:85yt 1C"t with y0 D 4 and Std."t/D 1

Note that R s/ does not have to be symmetric unless sD 0 However, note that R s/ D

R s/0 This follows from noting that

Trang 25

Example 2.1 (Bivariate case.) Letyt D Œxt; zt0with Ext DEzt D 0 Then

#:Note thatR s/ is

"

Cov.xt; xt Cs/ Cov xt; zt Cs/Cov.zt; xt Cs/ Cov zt; xt Cs/

#

D

"

Cov.xt s; xt/ Cov xt s; zt/Cov.zt s; xt/ Cov zt s; xt/

#

;which is indeed the transpose ofR s/

The autocovariances of the (vector) yt process can be estimated as

O

R s/D 1

T

TX

Trang 26

2.3 White Noise

A white noise time process has

E"t D 0Var "t/D 2, and

If, in addition, "t is normally distributed, then it is said to be Gaussian white noise Theconditions in (2.4)-(2.6) are satisfied so this process is covariance stationary Moreover,(2.8) is also satisfied, so the process is ergodic for the mean (and all moments if "t isnormally distributed)

A qt h-order moving average processis

yt D "t C 1"t 1C ::: C q"t q; (2.16)where the innovation "t is white noise (usually Gaussian) We could also allow both ytand "t to be vectors; such a process it called a vector MA (VMA)

Example 2.2 The mean of an MA(1),yt D "tC 1"t 1, is zero since the mean of"t (and

"t 1) is zero The first three autocovariance are

Var.yt/D E "t C 1"t 1/ "tC 1"t 1/D 2 1C 12



Cov.yt 1; yt/D E "t 1C 1"t 2/ "t C 1"t 1/D 21

Cov.yt 2; yt/D E "t 2C 1"t 3/ "t C 1"t 1/D 0; (2.18)

Trang 27

and Cov.yt s; yt/ D 0 for jsj  2 Since both the mean and the covariances are finiteand constant acrosst , the MA(1) is covariance stationary Since the absolute value ofthe covariances sum to a finite number, the MA(1) is also ergodic for the mean The firstautocorrelation of an MA(1) is

Corr.yt 1; yt/D 1

1C 12

:

Since the white noise process is covariance stationary, and since an MA.q/ with m <

1 is a finite order linear function of "t, it must be the case that the MA.q/ is covariancestationary It is ergodic for the mean since Cov.yt s; yt/ D 0 for s > q, so (2.8) issatisfied As usual, Gaussian innovations are then sufficient for the MA(q) to be ergodicfor all moments

The effect of "t on yt, yt C1; :::, that is, the impulse response function, is the same asthe MA coefficients

yt D "t C 1"t 1C ::: C q"t q

yt C1 D "t C1C 1"t C ::: C q"t qC1::

:

yt Cq D "t CqC 1"t 1CqC ::: C q"t

yt CqC1 D "t CqC1C 1"t Cq C ::: C q"t C1:The expected value of yt, conditional onf"wgt swD 1is

Trang 28

The forecasts made in t D 2 then have the follow expressions—with an example using

1 D 2; "1D 3=4 and "2 D 1=2 in the second column

ytj f"t s; "t s 1; : : :g  N ŒEt syt; Var.yt Et syt/ (2.22)

 Ns"t s C ::: C q"t q; 2 1C 12C ::: C s 12  : (2.23)The conditional mean is the point forecast and the variance is the variance of the forecasterror Note that if s > q, then the conditional distribution coincides with the unconditionaldistribution since "t s for s > q is of no help in forecasting yt

Example 2.5 (MA(1) and convergence from conditional to unconditional distribution.)From examples 2.3 and 2.4 we see that the conditional distributions change according to

Trang 29

(where˝2indicates the information set int D 2)

y3j ˝2  N E2y3; Var.y3 E2y3// D N 1; 1/

y4j ˝2  N E2y4; Var.y4 E2y4// D N 0; 5/

Note that the distribution of y4j ˝2 coincides with the asymptotic distribution

Estimation of MA processes is typically done by setting up the likelihood functionand then using some numerical method to maximize it

"

"1t

"2t

#:

All stationary AR(p) processes can be written on MA(1) form by repeated tion To do so we rewrite the AR(p) as a first order vector autoregression, VAR(1) Forinstance, an AR(2) xt D a1xt 1C a2xt 2C "t can be written as

substitu-"

xt

xt 1

#D

"

"t0

Trang 30

Iterate backwards on (2.26)

yt D A Ayt 2C "t 1/C "t

D A2yt 2C A"t 1C "t::

:

D AKC1yt K 1C

KXsD0

0 2    0::

: :::    :::

377775andZ Dh z1 z2    zn

i:

Note that we therefore get

yt D

1XsD0

Trang 31

0 0.2

Figure 2.2: Conditional moments and distributions for different forecast horizons for theAR(1) process yt D 0:85yt 1C "t with y0 D 4 and Std."t/D 1

Example 2.9 (AR(1).) For the univariate AR(1) yt D ayt 1 C "t, the characteristicequation is.a / z D 0, which is only satisfied if the eigenvalue is  D a The AR(1) istherefore stable (and stationarity) if 1 < a < 1 This can also be seen directly by notingthataKC1yt K 1declines to zero if0 < a < 1 as K increases

Similarly, most finite order MA processes can be written (“inverted”) as AR.1/ It istherefore common to approximate MA processes with AR processes, especially since thelatter are much easier to estimate

Example 2.10 (Variance of AR(1).) From the MA-representationyt DP1

sD0as"t sandthe fact that "t is white noise we get Var.yt/ D 2P1

sD0a2s D 2= 1 a2 Notethat this is minimized ata D 0 The autocorrelations are obviously ajsj The covariancematrix offytgTt D1is therefore (standard deviationstandard deviationautocorrelation)

2

2

66666664

Example 2.11 (Covariance stationarity of an AR(1) withjaj < 1.) From the MA-representation

sD0as"t s, the expected value of yt is zero, since E"t s D 0 We know thatCov(yt; yt s)D ajsj2= 1 a2 which is constant and finite

Trang 32

Example 2.12 (Ergodicity of a stationary AR(1).) We know that Cov(yt; yt s)D ajsj2= 1 a2,

so the absolute value is

jCov.yt; yt s/j D jajjsj2= 1 a2Using this in (2.8) gives

1XsD0jCov yt s; yt/j D 

2

1XsD0jajs

2s 1

2:

The distribution ofyt Cs conditional on yt is normal with these parameters See Figure

2.2 for an example

2.5.1 Estimation of an AR(1) Process

Suppose we have samplefytgTt D0of a process which we know is an AR.p/, yt D ayt 1C

"t, with normally distributed innovations with unknown variance 2

Trang 33

Recall that the joint and conditional pdfs of some variables z and x are related as

2exp y2 ay1/2C y1 ay0/2

t D1.yt a1yt 1/2

!

Taking logs, and evaluating the first order conditions for 2 and a gives the usual OLSestimator Note that this is MLE conditional on y0 There is a corresponding exact MLE,but the difference is usually small (the asymptotic distributions of the two estimators arethe same under stationarity; under non-stationarity OLS still gives consistent estimates).The MLE of Var("t) is given byPT

t D1 Ov2t=T , where Ovt is the OLS residual

These results carry over to any finite-order VAR The MLE, conditional on the initialobservations, of the VAR is the same as OLS estimates of each equation The MLE ofthe ijt h element in Cov("t) is given by PTt D1 Ovi tOvjt=T , where Ovi t and Ovjt are the OLSresiduals

To get the exact MLE, we need to multiply (2.33) with the unconditional pdf of y0(since we have no information to condition on)

p22=.1 a2/exp



y0222=.1 a2/



since y0  N.0; 2=.1 a2// The optimization problem is then non-linear and must besolved by a numerical optimization routine

Trang 34

2.5.2 Lag Operators

A common and convenient way of dealing with leads and lags is the lag operator, L It issuch that

Lsyt D yt s for all (integer) s

For instance, the ARMA(2,1) model

yt a1yt 1 a2yt 2D "t C 1"t 1 (2.35)can be written as

1 a1L a2L2 yt D 1 C 1L/ "t; (2.36)which is usually denoted

TX

t D1

xtxt0

1T

TX

t D1

xt Dh yt 1 yt 2    yt p

i:The first term in (2.38) is the inverse of the sample estimate of covariance matrix of

xt (since Eyt D 0), which converges in probability to ˙xx1 (yt is stationary and ergodicfor all moments if "t is Gaussian) The last term, T1 PTt D1xt"t, is serially uncorrelated,

so we can apply a CLT Note that Ext"t"0tx0t DE"t"0tExtx0t D 2˙xx since ut and xt areindependent We therefore have

1pT

TX

t D1

Trang 35

Combining these facts, we get the asymptotic distribution

p

Consistency follows from taking plim of (2.38)

plim OˇLS ˇD ˙xx1plim 1

T

TX

t D1

xt"t

D 0;

since xt and "t are uncorrelated

2.5.4 Autoregressions versus Autocorrelations

It is straightforward to see the relation between autocorrelations and the AR model whenthe AR model is the true process This relation is given by the Yule-Walker equations.For an AR(1), the autoregression coefficient is simply the first autocorrelation coeffi-cient For an AR(2), yt D a1yt 1C a2yt 2C "t, we have

5 D

264

Cov.yt; a1yt 1C a2yt 2C "t/Cov.yt 1; a1yt 1C a2yt 2C "t/Cov.yt 2; a1yt 1C a2yt 2C "t/

375

D

2

64

a1Cov.yt; yt 1/C a2Cov.yt; yt 2/C Cov.yt; "t/

0 1

2

37

5 D

264

a1 1C a2 2C Var."t/

a1 0C a2 1

a1 1C a2 0

37

"

a1C a21

a11C a2

#or

"

1

2

#D

Trang 36

can solve for the autoregression coefficients This demonstrates that testing that all theautocorrelations are zero is essentially the same as testing if all the autoregressive coeffi-cients are zero Note, however, that the transformation is non-linear, which may make adifference in small samples.

An ARMA model has both AR and MA components For instance, an ARMA(p,q) is

yt D a1yt 1C a2yt 2C ::: C apyt pC "t C 1"t 1C ::: C q"t q: (2.43)Estimationof ARMA processes is typically done by setting up the likelihood function andthen using some numerical method to maximize it

Even low-order ARMA models can be fairly flexible For instance, the ARMA(1,1)model is

yt D ayt 1C "t C "t 1, where "t is white noise (2.44)The model can be written on MA(1) form as

yt D "t C

1XsD1

Trang 37

where "t is white noise.

A unit root process can be made stationary only by taking a difference The simplestexample is the random walk with drift

where "t is white noise The name “unit root process” comes from the fact that the largest

Trang 38

eigenvalues of the canonical form (the VAR(1) form of the AR(p)) is one Such a process

is said to be integrated of order one (often denoted I(1)) and can be made stationary bytaking first differences

Example 2.14 (Non-stationary AR(2).) The processyt D 1:5yt 1 0:5yt 2C "t can bewritten

"

yt

yt 1

#D

"

"t0

#

;where the matrix has the eigenvalues 1 and 0.5 and is therefore non-stationary Note thatsubtractingyt 1from both sides givesyt yt 1 D 0:5 yt 1 yt 2/C"t, so the variable

xt D yt yt 1is stationary

The distinguishing feature of unit root processes is that the effect of a shock nevervanishes This is most easily seen for the random walk Substitute repeatedly in (2.49) toget

yt D  C  C yt 2C "t 1/C "t::

:

D t C y0C

tXsD1

The effect of "t never dies out: a non-zero value of "t gives a permanent shift of the level

of yt This process is clearly non-stationary A consequence of the permanent effect of

a shock is that the variance of the conditional distribution grows without bound as theforecasting horizon is extended For instance, for the random walk with drift, (2.50), thedistribution conditional on the information in t D 0 is N y0C t; s2 if the innova-tions are Gaussian This means that the expected change is t and that the conditionalvariance grows linearly with the forecasting horizon The unconditional variance is there-fore infinite and the standard results on inference are not applicable

In contrast, the conditional distributions from the trend stationary model, (2.48), is

N st; 2

A process could have two unit roots (integrated of order 2: I(2)) In this case, we need

to difference twice to make it stationary Alternatively, a process can also be explosive,that is, have eigenvalues outside the unit circle In this case, the impulse response functiondiverges

Trang 39

Example 2.15 (Two unit roots.) Supposeyt in Example (2.14) is actually the first ence of some other series,yt D zt zt 1 We then have

differ-zt zt 1D 1:5 zt 1 zt 2/ 0:5 zt 2 zt 3/C "t

zt D 2:5zt 1 2zt 2C 0:5zt 3C "t;which is an AR(3) with the following canonical form

264

zt

zt 1

zt 2

37

5 D

264

375

264

zt 1

zt 2

zt 3

37

5 C

264

"t00

37

eigen-2.7.2 Spurious Regressions

Strong trends often causes problems in econometric models where yt is regressed on xt

In essence, if no trend is included in the regression, then xt will appear to be significant,just because it is a proxy for a trend The same holds for unit root processes, even ifthey have no deterministic trends However, the innovations accumulate and the seriestherefore tend to be trending in small samples A warning sign of a spurious regression iswhen R2> DW statistics

For trend-stationary data, this problem is easily solved by detrending with a lineartrend (before estimating or just adding a trend to the regression)

However, this is usually a poor method for a unit root processes What is needed is afirst difference For instance, a first difference of the random walk is

yt D yt yt 1

which is white noise (any finite difference, like yt yt s, will give a stationary series),

Trang 40

so we could proceed by applying standard econometric tools to yt.

One may then be tempted to try first-differencing all non-stationary series, since itmay be hard to tell if they are unit root process or just trend-stationary For instance, afirst difference of the trend stationary process, (2.48), gives

Its unclear if this is an improvement: the trend is gone, but the errors are now of MA(1)type (in fact, non-invertible, and therefore tricky, in particular for estimation)

2.7.3 Testing for a Unit Root I

Suppose we run an OLS regression of

where the true value ofjaj < 1 The asymptotic distribution is of the LS estimator is

p

(The variance follows from the standard OLS formula where the variance of the estimator

is 2.X0X=T / 1 Here plim X0X=T DVar.yt/ which we know is 2= 1 a2)

It is well known (but not easy to show) that when a D 1, then Oa is biased towardszero in small samples In addition, the asymptotic distribution is no longer (2.54) Infact, there is a discontinuity in the limiting distribution as we move from a stationary/to

a non-stationary variable This, together with the small sample bias means that we have

to use simulated critical values for testing the null hypothesis of aD 1 based on the OLSestimate from (2.53)

The approach is to calculate the test statistic

Std.Oa/;and reject the null of non-stationarity if t is less than the critical values published byDickey and Fuller (typically more negative than the standard values to compensate for thesmall sample bias) or from your own simulations

In principle, distinguishing between a stationary and a non-stationary series is very

Ngày đăng: 05/03/2016, 13:15

TỪ KHÓA LIÊN QUAN