Econometrics statistical foundations and applications

PROBABILITY LIMITS, ASYMPTOTIC DISTRIBUTIONS, AND PROPERTIES OF MAXIMUM LIKELIHOOD ESTIMATORS 3.4 Central Limit Theorems and Related Topics 100 3.5 Miscellaneous Useful Convergence Re

Trang 1

ECONOMETRICS

Trang 2

economics

Trang 3

ECONOMETRICS STATISTICAL FOUNDATIONS

Trang 4

Library of Congress Cataloging in Publication Data

Dhrymes, Phoebus J

1932-Econometrics: statistical foundations and

applica-tions

Corrected reprint of the 1970 ed published by

Harper & Row, New York

1 Econometrics I Title

[HB139.D48 1974] 330'.01'8 74-10898

Second printing: July, 1974

First published 1970, by Harper & Row, Publishers, Inc

Design: Peter Klemke, Berlin

No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag

ISBN-13:978-0-387-90095-7 e-ISBN-13:978-1-4613-9383-2 001: 10.1007/978-1-4613-9383-2

Trang 5

PREFACE TO SECOND PRINTING

The main difference between this edition by Springer-Verlag and the earlier one by Harper & Row lies in the elimination of the inordinately high number of misprints found in the latter A few minor errors of exposition have also been eliminated The material, however, is essentially similar to that found in the earlier version

I wish to take this opportunity to express my thanks to all those who pointed out misprints to me and especially to H Tsurumi, Warren Dent and J D Khazzoom

New York

February, 1974

PHOEBUS J DHRYMES

Trang 6

PREFACE TO FIRST PRINTING

This book was written, primarily, for the graduate student in metrics Its purpose is to provide a reasonably complete and rigorous exposition of the techniques frequently employed in econometric research, beyond what one is likely to encounter in an introductory mathematical statistics course It does not aim at teaching how one can do successful original empirical research Unfortunately, no one has yet discovered how to communicate this skill impersonally Practicing econometricians may also find the integrated presentation

econo-of simultaneous equations estimation theory and spectral analysis a convenient reference

I have tried, as far as possible, to begin the discussion of the various topics from an elementary stage so that little prior knowledge of the subject will be necessitated It is assumed that the potential reader is familiar with the elementary aspects of calculus and linear algebra Additional mathematical material is to be found in the Appendix Statistical competence, approximately at the level of a first-year course in elementary mathematical statistics is also assumed on the part of the reader

The discussion, then, develops certain elementary aspects of variate analysis, the theory of estimation of simultaneous equations systems, elementary aspects of spectral and cross-spectral analysis, and shows how such techniques may be applied, by a number of examples

multi-It is often said that econometrics deals with the quantification of economic relationships, perhaps as postulated by an abstract model

Trang 7

As such, it is a blend of economics and statistics, both presupposing a stantial degree of mathematical sophistication Thus, to practice econometrics compentently, one has to be well-versed in both economic and statistical theory Pursuant to this, I have attempted in all presentations to point out clearly the assumptions underlying the discussion, their role in establishing the conclusions, and hence the consequence of departures from such assumptions Indeed, this

sub-is a most crucial aspect of the student's training and one that sub-is rather frequently neglected This is unfortunate since competence in econometrics entails, inter alia, a very clear perception of the limitations of the conclusions one may obtain from empirical analysis

A number of specialized results from probability theory that are crucial for establishing, rigorously, the properties of simultaneous equations estimators have been collected in Chapter 3 This is included only as a convenient reference, and its detailed study is not essential in understanding the remainder of the book It is sufficient that the reader be familiar with the salient results presented

in Chapter 3, but it is not essential that he master their proof in detail I have used various parts of the book, in the form of mimeographed notes, as the basis

of discussion for graduate courses in econometrics at Harvard University and, more recently, at the University of Pennsylvania

The material in Chapters I through 6 could easily constitute a one-semester course, and the remainder may be used in the second semester The instructor who may not wish to delve into spectral analysis quite so extensively may include alternative material, e.g., the theory of forecasting

Generally, I felt that empirical work is easily accessible in journals and similar publications, and for this reason, the number of empirical examples is small By now, the instructor has at his disposal a number of pUblications on econometric models and books of readings in empirical econometric research, from which he can easily draw in illustrating the possible application of various techniques

J have tried to write this book in a uniform style and notation and preserve maximal continuity of presentation For this reason explicit references to individual contributions are minimized; on the other hand, the great cleavage between the Dutch and Cowles Foundation notation is bridged so that one can follow the discussion of 2SLS, 3SLS, and maximum likelihood estimation in a unified notational framework Of course, absence of references from the dis-cussions is not meant to ignore individual contributions, but only to insure the continuity and unity of exposition that one commonly finds in scientific, mathe-matical, or statistical textbooks

Original work relevant to the subject covered appears in the references at the end of each chapter; in several instances a brief comment on the work is inserted This is only meant to give the reader an indication of the coverage and does not pretend to be a review of the contents

Finally, it is a pleasure for me to acknowledge my debt to a number of

Trang 8

individuals who have contributed directly or indirectly in making this book what it is

I wish to express my gratitude to H Theil for first introducing me to the rigorous study of econometrics, and to I Olkin from whose lucid lectures I first learned about multivariate analysis T Amemiya, L R Klein, J Kmenta,

B M Mitchell, and A Zellner read various parts of the manuscript and offered useful suggestions V Pandit and A Basu are chiefly responsible for compiling the bibliography Margot Keith and Alix Ryckoff have lightened my burden by their expert typing

PHOEBUS J DHRYMES

January, 1970

Trang 9

1.4 The Multivariate Normal Distribution 12

1.5 Correlation Coefficients and Related Topics 20

Trang 10

3

4

5

PROBABILITY LIMITS, ASYMPTOTIC

DISTRIBUTIONS, AND PROPERTIES OF

MAXIMUM LIKELIHOOD ESTIMATORS

3.4 Central Limit Theorems and Related Topics 100

3.5 Miscellaneous Useful Convergence Results 110

3.6 Properties of Maximum Likelihood (ML) Estimators 114 3.7 Estimation for Distribution Admitting of Sufficient Statistics 130 3.8 Minimum Variance Estimation and Sufficient Statistics 136

ESTIMATION OF

SIMULTANEOUS EQUATIONS SYSTEMS

4.1 Review of Classical Methods 145

4.2 Asymptotic Distribution of Aitken Estimators 161

4.3 Two-Stage Least Squares (2SLS) 167

4.4 2SLS as Aitken and as OLS Estimator 183

4.5 Asymptotic Properties of 2SLS Estimators 190

4.6 The General k-Class Estimator 200

4.7 Three-Stage Least Squares (3SLS) 209

APPLICATIONS OF CLASSICAL AND

SIMULTANEOUS EQUATIONS TECHNIQUES

AND RELATED PROBLEMS

5.3 An Example of 2SLS and 3SLS Estimation 236

5.4 Measures of Goodness of Fit in Multiple Equations Systems: Coeficient of (Vector) Alienation and Correlation 240

5.5 Canonical Correlations and Goodness of Fit in Econometric Systems 261

5.6 Applications of Principal Component Theory In Econometric Systems 264

5.7 Alternative Asymptotic Tests of Significance for 2SLS Estimated Parameters 272

Trang 11

6.2 Indirect Least Squares (ILS) 279

6.3 The Identification Problem 289

6.4 Instrumental Variables Estimation 296

6.5 Recursive Systems 303

MAXIMUM LIKELIHOOD METHODS

7.1 Formulation of the Problem and Assumptions 314

279

314

7.2 Reduced Form (RF) and Full Information Maximum Likelihood (FIML) Estimation 316

7.3 Limited Information (LlML) Estimation 328

RELATIONS AMONG ESTIMATORS;

MONTE CARLO METHODS

8.1 Introduction 358

8.2 Relations Among Double k-Class Estimators 359

8.3 I.V., ILS, and Double k-Class Estimators 364

8.4 Limited Information Estimators and Just Identification 365 8.5 Relationships Among Full Information Estimators 367

8.6 Monte Carlo Methods 372

SPECTRAL ANALYSIS

9.1 Stochastic Processes 382

358

382

9.2 Spectral Representation of Covariance Stationary Series

9.3 Estimation of the Spectrum 419

444 Spectrum, and

10.4 An Empirical Application of Cross-Spectral Analysis 479

Trang 12

11

12

APPROXIMATE SAMPLING DISTRIBUTIONS

AND OTHER STATISTICAL ASPECTS

OF SPECTRAL ANALYSIS

11.1 Aliasing 485

11.2 " Prewhitening," "Recoloring," and Related Issues 488

11.3 Approximate Asymptotic Distributions; Considerations of Design and Analysis 492

APPLICATIONS OF SPECTRAL ANALYSIS

TO SIMULTANEOUS EQUATIONS SYSTEMS

12.1 Generalities 507

12.2 Lag Operators 509

12.3 An Operator Representation of the Final Form 517

12.4 Dynamic Multipliers and the Final Form 521

12.5 Spectral Properties of the Final Form 525

12.6 An Empirical Application 533

485

507

A.l Complex Numbers and Complex-Valued Functions 545

A.2 The Riemann-StieItjes Integral 551

A.3 Monotonic Functions and Functions of Bounded Variation 552 A.4 Fourier Series 557

A.5 Systems of Difference Equations with Constant Coefficients 567 A.6 Matrix Algebra 570

Trang 13

ECONOMETRICS

Trang 14

state-in the state-interval (ex, ~) and so on

Still, in this elementary context the correlation of two variables was introduced and interpreted as a measure of the degree to which the two variables tend to move in the same or opposite direction

In econometric work, however, it is often necessary to deal with a number of relations simultaneously Thus we typically have an econometric model containing more than one equation Such a system may be simultaneously determined in the sense that the interaction of all variables as specified by the model determines simultaneously the behavior of the entire set of (jointly) dependent variables

Generally, the equations of an econometric model are, except for identities, stochastic ones and hence the problem arises of how to specify the (joint) stochastic character of a number of random vari-ables simultaneously This leads us to consider the problem of the distribution of vector random variables, that is, the characteristics

Trang 15

of the joint distribution of a number of random variables simultaneously and not" one at a time."

When the problem is considered in this context, it is apparent that there are more complexities than in the study of the distribution of scalar random vari-ables Therefore, if we are dealing with the (joint) distribution of m variables,

we might wish to say something about the distribution of a subset of k (k < m)

variables given the (m - k) remaining ones

This is a problem that cannot arise in the study of univariate distributions

In the following material we shall study in some detail certain elementary aspects of multivariate distributions, confining ourselves to the special case of the multivariate normal distribution

Let us now set forth the notational framework and conventions for this topic and obtain some simple but useful results

Definition 1: Let {xij: i = 1,2, , m,j = 1,2, , n} be a set of random variables Then the matrix

Definition 2: Let Z = (Zl' Z2, , zm)' be a random vector; then the

covariance matrix of Z is defined by

and as a matter of notation we write

Trang 16

This is, of course, not a universal practice; it is, however, used quite widely Notice, further, that in (1.1.5) (z - E(z))(z - E(z))' is an m x m random matrix, and thus the meaning of the expectation operator there is given by (1.1.2) of

In general, we shall identify vectors and matrices by their dimensions Thus the statement" x is m xl" will mean that x is a column vector with m elements,

while "A is m x n" will mean that A is a matrix with m rows and n columns

A simple consequence of the preceding is

Lemma 1: Let x be m x 1 and random; suppose that

Cov (y) = A~A'

PROOF: By definition, the ith element of y is given by

Cov (y) = E[(y - E(y))(y - E(y))'] = E[A(x - ~)(x - ~)' A']

The (i,j) element of (x - ~)(x - ~)' is given by

x~ = (Xi - ~i)(Xj - ~j)

(1.1.8)

(1.1.9)

(1.1.10) (1.1.11)

Trang 17

(1.1.20) Before considering the multivariate normal distribution in some detail, we should point out two useful facts One, implicit in the preceding, is this: if X is

a random matrix and if A, B, C are conformable nonrandom matrices, then

E(Y) = AE(X)B + C

where

Y=AXB+C

The second is given by

Lemma 2: Let x be m x 1, random; then

~ = Cov (x)

is at least a positive semidefinite, symmetric matrix

PROOF: The symmetry of~ is obvious; thus its (i,j) element is

CJij = E[(Xi - Il;)(x j - 11)]

while its (j, i) element is

CJji = E[(xj - 1l)(Xi - Ili)]

(1.1.21) (1.1.22)

Trang 18

From elementary mathematical statistics we know that the variance of y is

nonnegative Thus

o ~ Var (y) = E[ex'(x - J.1)(x - J.1),ex] = ex'l:ex Q.E.D

(1.1.27)

Remark 2: Notice that if l: is not strictly positive definite (i.e., if it is a

singular matrix), then there will exist a nonnull constant vector, say y, such that

(1.1.28)

This means that there exists a scalar random variable, say y, which is a linear combination of the elements of x and whose variance is zero The latter fact means that y is a constant Hence a singular covariance matrix l: in (1.1.27) implies that the elements of x are linearly dependent in the sense that there exists

a non null set of constants (Yl, Y2' Y3' , Ym) such that y = Ii'!: lYiXi is

non-random

If this is so, then the distribution of the random vector x is said to be

degenerate

If this is not so-that is, if l: is strictly positive definite, symmetric-then the

distribution of x is said to be proper In this textbook we shall only deal with proper distributions; thus the term "proper" will be suppressed everywhere except when the context clearly requires it

1.2 JOINT, MARGINAL,

AND CONDITIONAL DISTRIBUTIONS

In this section we shall define what we mean by the joint distribution of a number of random variables, derive certain features of it, and study its associated marginal and conditional distributions The following convention, which is often employed in mathematical statistics, will be adhered to

Convention: We shall denote random variables by capital Roman letters.1

We shall denote by lowercase letters the values assumed by the random variables

Thus Pr {X ~ x} indicates the probability that the random variable X will

assume a value equal to or less than (the real number) x

Definition 3: Let X be m x I, random; by its joint ( cumulative) tion function we mean a function F( " " , ) such that

distribu-i 0 ~ F ~ 1

ii F is monotonic nondecreasing in all its arguments

iii F(-oo, -00, , -(0) =0 F(oo, 00, , (0) = 1

iv Pr {X 1 ~ Xl' X2 ~ x 2 , ••• , Xm ~ X m } = F(Xl' X2, •.• , X m)

1 Recall that a random variable is a real-valued junction defined on the relevant sample space

Trang 19

In this textbook we shall always assume that F( " ) is absolutely continuous

so that the derivative

exists almost everywhere

Therefore we have

Definition 4: Let F(·,·, ) be the joint cumulative distribution function

of the m x 1 random variable X Suppose that F is absolutely continuous; then

(1.2.1 )

is said to be the joint density function of (the elements of) X

In the following material we shall always assume that the density function exists

Remark 2: It is clear from (1.2.1) and statement iv of Definition 3 that

Trang 20

Remark 4: It should also be clear that the marginal density of any element

of X, say Xl' is given by

(1.2.6) and is simply the density function of Xl as studied in elementary mathematical statistics courses

The marginal density of a subset of the elements of X simply characterizes their probability structure after the effects of all other variables have been allowed for (" averaged out" or "integrated out ") In particular, notice that the marginal density of X l' X 2 , • , X k does not depend on Xk+ l' Xk+2' , X m

In contradistinction to this, we have another associated density, namely, the conditional one

Recall from elementary probability that if A and B are two events, then the

conditional probability of A given B, denoted by peA I B), is defined as

g(.) is the marginal density of X2 andf(·) is the density of X = (~~)

Remark 5: As the notation in (1.2.8) makes clear, the conditional density

of Xl given X2 does depend on X2

Whereas in the case of the marginal density of Xl the effects of the variables

in X2 were "averaged" or "integrated" out, in the case of the conditional density of Xl the effects of the variables in X2 are allowed for explicitly by

"holding them constant." The meaning of this distinction will become clearer when we study the multivariate normal distribution

Moments are defined with respect to the densities above in the usual way Thus let h(·) be a function of the random variable X The expectation of heX)

is defined by

Trang 21

where the integral sign indicates the m-fold integral with respect to

Xl' X2 , ••• , x m •

If it is specified that h( ) depends only on Xl, then we can define two

expecta-tions for h(X1), one marginal and one conditiona1

The marginal expectation of h(X1) is defined by

(1.2.11) The conditional expectation of Xl given X2 is defined by

(1.2.12) Notice that in (1.2.11) we expect with respect to the marginal density of xl,

while in (1.2.12) we expect with respect to the conditional density of Xl given X2

Example 1: Suppose that we wish to obtain the mean of one of the elements

of X, say X l' In this case, take the function h( ) as

to the left of the dot indicates the random variable being expected, while the numbers appearing to the right of the dot indicate the conditioning variables Thus )12'1.3,4, ,m indicates the conditional mean of X2 given Xl' X 3 ,

X 4 , ••• , X m • Of course, we can define quantities such as )11 5.6,7, , m which would indicate the conditional mean of Xl given X 5' X 6, •• , X m • What we mean by this is the following: Obtain the marginal density of Xl' X 5 ,

density, determine the conditional density of X 1 given X 5 , X 6, ••• , Xmas

Finally, expect Xl with respect to I(X 1 I X5 , , Xm)

2 The term" marginal" is usually omitted; one speaks only of the mean of Xl

Trang 22

Example 2: Suppose that h( ) is such that is depends only on two variables

Thus, say,

The marginal3 expectation of heX) is given by

E[h(X)] = f (x, -1l,)(X2 -1l2)f(x) dx = 0"12 (1.2.18) The expectation here simply yields the covariance between the first and second elements of X

As before, we can again define the conditional covariance between X, and

X 2 given X 3 , X 4 , , X m Hence we have

We leave it to the reader to compute a number of different conditional variances and covariances

The preceding discussion should be sufficient to render the meaning of the notation, say O"SS'I,7,12,13, ,m or 0"77",2,'9,20,2', ,m, quite obvious

Finally; let us conclude with

Definition 7: Let X be m x 1 and random; then its elements are said to

be mutually (statistically) independent if and only if their (joint) density can be expressed as the product of the marginal densities of the individual elements

Remark 6: Suppose X is partitioned by Xl and X 2 as above; then Xl and

X 2 are said to be mutually independent if and only if the joint density of X can

be expressed as the product of the marginal densities of Xl and X2

We shall now abandon the convention whereby random variables and the values assumed by them are distinguished by the use of capital and lowercase letters respectively Henceforth no such distinction will be made The meaning will usually be clear from the context

Trang 23

be a transformation of En into itself Thus

where the ith element of Y is the function

and x and yare not necessarily random

Suppose that the inverse transformation also exists; that is, suppose there

exists a function g( ) such that

-°Yl OY2 °Yn

oX2 OX2 OX2

I oXi 1= °Yl OY2 °Yn

oXn oXn oXn

-°Yl °Y2 °Yn

and thus it is expressed solely in terms of the Yi' i = 1, 2, , n

Suppose now that x is random, having density f(·), and consider the problem

of determining the density of y in terms off(') and the transformation in (1.3.1)

To this effect, we prove:

Trang 24

Moreover, suppose that h(·) and g(.) are differentiable Then the density,

<1>( ), of y is given by

where 111 is the absolute value of the Jacobian of the transformation.4

PROOF: The cumulative distribution of x is given by

(1.3.9) Notice that F(x) in (1.3.9) gives the probability assigned by f(·) to the set

(1.3.10) which is the Cartesian product of the intervals (-00, x;) i = 1,2, , m This accounts for the notation employed in the last member of (1.3.9) Now, if in (1.3.9) we make the transformation

where B is the transform of A under h

The integral in (1.3.13) gives the probability assigned to the set B by the

functionf[g(·)] Ill Moreover, the set B is of the form

(1.3.14) and corresponds to the" joint event"

4 We must add, of course, the restriction that J does not vanish on every (nondegenerate)

subset of Em We should also note that in standard mathematical terminology J is the inverse

of the Jacobian of (1.3.6) In the statistical literature, it is referred to as the Jacobian of (1.3.6)

We shall adhere to this latter usage because it is more convenient for our purposes

5 The validity of this representation follows from the theorems dealing with change of variables in multiple integrals See, for example, R C Buck, Advanced Calculus, p 242, New York, McGraw-Hill, 1956

Trang 25

where Yi indicates the random variable and Yi the yalues assumed by it Since

the integrand of (1.3.13) is nonnegative and its integral over the entire space is unity, we conclude that

is the joint density of the elements of y Q.E.D

1.4 THE MULTIVARIATE NORMAL DISTRIBUTION

It is assumed that the reader is familiar with the univariate normal distribution This being the case, perhaps the simplest way of introducing the multivariate normal distribution is as follows

Let XI: i = 1,2, , m be random variables identically and

indepen-dently distributed as N(O, I); that is, they are each normal with mean

zero and unit variance Since they are independent, the density of the vector

q,(y) =f[A-I(y - b)JIJI

= (21t)-m/2IAI-I exp [-t(y - b)'A,-IA-I(y - b)] (1.4.4) For notational simplicity, we have assumed in (1.4.4) that IAI > 0 We know that

and thus, from Lemma 1, we conclude that

To conform to standard usage, put as a matter of notation

Trang 26

and rewrite (1.4.4) in standard form as

(1.4.8) Thus we define the multivariate normal by (1.4.8) More formally,

Definition 9,' Let y be m x 1 random; then y is said to have the variate normal distribution with mean 11 and covariance matrix L, denoted by

multi-if and only multi-if the joint density of its elements is given by (1.4.8)

Before establishing certain properties of the normal distribution, let us introduce

Definition 10,' The characteristic function of a random vector x is given by

where i has the property i 2 = - I and t is a vector of arbitrary real constants

Remark 7,' The usefulness of the characteristic function is that it is always defined for random variables possessing densities; moreover, there is one-to-one correspondence between characteristic and density functions Thus if f(·) is the density of x, then explicitly we have

which is merely the Fourier transform of f(·)

Hence if we know the characteristic function of a random variable, we can,

in principle, determine its density function by inverting (1.4.10) We have inserted" in principle" advisedly, for it is not always simple or possible to

determinef(·) from \)J(') explicitly

Finally, note that

Trang 27

denote, respectively, the kth moment of Xj and the cross moment of Xj and x.,

Lemma 4: The characteristic function of x ~ N(~, I) is given by

PROOF: Let YI' Y2, Y3, , Ym be independently distributed as N(O, 1) It can

be shown that the characteristic function of Yj is given by

(1.4.14) Since the Yj' j = 1, 2, , m, are mutually independent, their joint characteristic

Then x has what we have termed the multivariate normal distribution with mean

~ and covariance matrix I

Using (1.4.16) and (1.4.17), we have

exp (-1r'r) = E[exp (ir'y)] = E{ exp [ir'A -lex - ~)]}

= E[exp (ir'A-Ix)' exp (-ir' A-l~)]

Put

(1.4.18)

(1.4.19) Since r is an arbitrary vector of constants, then so is t Substituting (1.4.19) in (1.4.18) and rearranging terms, we obtain, as a result of(1.4.17),

exp (it'~ - 1t'It) = E[exp (it' x)] = \)Jx(t) Q.E.D

(1.4.20)

Trang 28

Remark 8: Notice that the characteristic function of the multivariate normal is an exponential containing a linear and a quadratic term in t, the

parameter defining the characteristic function The coefficients in the linear terms are simply the elements of the mean vector Jl, while the matrix of the quadratic form in (1.4.20) is the covariance matrix of the distribution We next prove some of the fundamental properties of the normal distribution

lemma 5: Let x '" N(Jl, r) and write

where B is m x m, b is m x I, the elements of both being (nonrandom)

con-stants Then

y '" N(BJl + b, BrB')

PROOF: The characteristic function of y is, by definition,

"'y(t) = E[exp (it'y)]

Let

s = B't

and, using (1.4.21), note that

E[exp (it'y)] = E[exp (it'Bx)] exp (it'b) = E[exp (is'x)] exp (it'b)

(1.4.22) (1.4.23) (1.4.24)

= exp (is'Jl- -!s'rs + it'b) = exp [it'(BJl + b) - -!t'B1:B't]

Trang 29

Partition T conformably with ~ so that

By Lemma 5, y ~ N(O, 1m), where 1m is the identity matrix of order m Thus the

Yj :j = 1,2, , m are mutually independent and each is distributed as N(O, 1)

It follows, therefore, that yl ~ N(O, Ik)' By (1.4.31) we have

Q.E.D

Corollary: The marginal density of the ith element of x is normal with mean

Ili and variance O'ii'

PROOF: In the proof of Lemma 6, take

(1.4.35)

by rearranging the elements of x, 11, and ~ so that the first element of x* is Xi'

the first element of 11* is Ili' and the (1, I) element of ~* is O'u, where x*, 11*,

and ~* represent the vectors and matrix after rearrangement The conclusion of the corollary follows immediately, for

(1.4.36)

Lemma 7: Let x ~ N(Il, ~) and partition x, 11, and ~ as in Lemma 6 Then the conditional distribution of Xl given Xl is

(1.4.37) PROOF: By Lemma 6, Xl ~ N(ll l , ~l2)' By definition, the conditional density

of Xl given Xl is the quotient of the joint density of Xl and Xl to the marginal density of x 2 • Thus

1 2 (2n)-mI21~I-t exp [-t(x _1l)'~-l(X -11)]

hex I x ) = (2n)-<m-k)/2 I~d-+ exp [-t(x 2 _ 1l2)'~2/(X2 - 112)] (1.4.38)

Trang 30

H(xl -1l 1 ),V 11 (X I -Ill) + 2(xl -1l1),VI2 (X 2 -11 2)

+ (x 2 - 1l2),V22(X 2 - 112) - (x 2 - 1l2)'(V22 - V21 V1/ V12)(X2 - 1l2)}

= H[xl - (Ill - Vl/VuCx2 -1l2)],Vl1 [X I - (Ill - V1/V12(x 2 - 1l2)]}

(1.4.43) The determinantal expressions in (1.4.38) yield, in view of (1.4.41),

11:22I!I1:I-! = 11:11 - 1:121:2"i1:21 1-+

Moreover, 7

V 11 = (1:11 - 1:12 1:2"l 1:21)-1

Hence (1.4.38) may be rewritten

h(xl ! x 2) = (2n)-k/2! Vll !+ exp {_1[xl - (Ill x:E12 :E2i(x2 -112»],

VIl[XI - (Ill +:El2 1:2 i(x 2 - 1l 2 »)]}

6 This is easily established as follows Let

(*) D = [~ -~12 ;11]

Then

(**) D~ = [~11 -~~:2 ~11~21 ~~J

Since I DI = I, the result in (1.4.41) follows immediately from (**)

7 The relations in (1.4.42) and (1.4.45) result from

(1.4.44 ) (1.4.45)

Q.E.D (1.4.46)

(*) [~11 ~12]-1 =[ (~11-~12~2,t~21)-1 -~111~12(:E22-:E21:El/:E12)-1]

~21 ~22 -:E21~21(:Ell-:E12:E2l:E2d-l (:E22-:E21~i?:E12)-1

which can be verified directly and their verification is thus left as an exercise for the reader

Trang 31

Remark 9: One important aspect of the preceding result is that the ditional mean of Xl given X2 is a linear function of X2, while the covariance matrix of its conditional distribution is independent of X2

con-Finally, it would be desirable to have a simple test as to whether two sets of normal variables are mutually independent Thus

Lemma 8: Let x ~ N(Jl, ~) and let x, Jl, and ~ be partitioned as in Lemma 6 Then Xl and X2 are mutually independent if and only if

Suppose that Xl and x 2 are mutually independent and distributed, ively, as Xl ~ N(Jll, ~ll)' x 2 ~N(Jl2, ~22)' Let Xi be an element of Xl and Xj

Trang 32

It follows by (1.4.53) that

(1.4.55) These fundamental properties of the normal distribution are summarized in

Theorem 1: Let x '" N(~, E) and partition x by x = G~) such that Xl

has k elements and x 2 (m - k) elements Partition ~ and E conformally Then the following statements are true

i The characteristic function of x is given by

~(t) = exp (it'~ - t t'Et)

iv The conditional distribution of Xl given x 2 is

Xl I X2 '" N[~ 1 + E12 E2}(x 2 - ~2), Ell - E12 E2"1E 21 ]

v The sub vectors Xl and x 2 are mutually independent if and only if

E12 = 0

PROOF: See Lemmas 4, 5, 6, 7, and 8

Before we leave this topic, let us establish a remarkable fact about the normal distribution

Proposition 1: Let x be a random vector such that

If every (nontrivial) linear combination of the elements of x is normally buted, then

distri-(1.4.57) PROOF: Let a be an arbitrary vector of constants and define

Trang 33

Then y is a scalar random variable and, by the hypothesis of the proposition, is

normally distributed, say with mean v and variance 0'2 Its characteristic function

is thus

E[exp (isy)] = exp (isv - ts20'2)

where s is an arbitrary (real) scalar But

E(y) = ex.'1l Var (y) = ex.'~ex

Putting

t = sex

and using (1.4.58) and (1.4.59), we conclude

E[exp (it'x)] = exp (it'll- !t'~t)

which shows that x "" N(Il, ~) Q.E.D

1.5 CORRELATION COEFFICIENTS

AND RELATED TOPICS

(1.4.59) (1.4.60) (1.4.61 ) (1.4.62)

In this section we shall give the definition of several types of correlation efficients and show the similarity between some aspects of a conditional normal density and the general linear model

where y is T x 1, X is T x n and they refer, respectively, to the observations

on the dependent and explanatory variables The vector ~ consists of the unknown parameters to be estimated and u is the T x 1 vector of disturbances, which is typically assumed to have mean zero and covariance matrix 0'21

Definition 11: Let x'" N(Il, ~); then the (simple) correlation coefficient

between two elements of x, say Xi and x j ' is defined by

O'ij

p = :=====

IJ JO'HO'jj

(1.5.2) Thus it is the correlation coefficient in the (marginal) joint density of Xi and x j • 8

Definition 12: Let x "" N(Il, ~) and partition x by Xl and X2 so that Xl has

k and X2 (m - k) elements If Xi' Xj are any two elements of Xl, then their partial correlation coefficient (for fixed Xk+ 1 , Xk+ 2 , ••• , xm) is defined by

Trang 34

where O'jj.k+l k+2 m is the (i,j) element of the covariance matrix in the conditional density of Xl given X2 In the present case (normal distribution), this matrix is 1:11 -1:121:2"l1:21

Hence partial correlation coefficients are simply correlation coefficients computed with respect to the (conditional) joint density of the variables in question given another group of variables The variables "held constant" are

en numerated after the dot, in the suggestive notation of (1.5.3)

Remark 10: The difference betwen a simple and a partial correlation efficient is this: a simple correlation coefficient between Xj and Xj expresses the degree of relation between these variables, when the effects of all other variables have been averaged out A partial correlation, however, expresses the degree

co-of relation between Xj and Xj for given values of all other (relevant) variables

In the normal case, this aspect is obscured because, by iv of Theorem 1, the covariance matrix of the conditional distribution does not depend on the con-ditioning variables Notice, however, that in (1.5.3) Pij.k+l.k+2 • m depends

on 1:12 and 1:22 , which contain the covariance parameters of the conditioning variables

Finally, it should be pointed out that if Xl and X2 are mutually independent (in the normal case if 1:12 = 0), then

Definition 13: Let x '" N(~, 1:) and partition as in Definition 12 Let a,'X2

be a linear combination of the elements of X2 and let Xj be an element of Xl

Then the maximum correlation9 between Xi and the linear combination a,'X2 is called the multiple correlation between Xi and the vector X2 and is denoted by

9 The term maximum is needed here for the correlation between the two scalar random

variables XI and IX'X2 will depend on the arbitrary vector of constants IX

Trang 35

PROOF: Notice that the conditional expectation of Xi given X2 is10

(1.5.6)

We first show that X2 and Xi - rJ.'x2 are mutually independent Because

Xi - rJ.'x2 and X2 are (jointly) normally distributed, to accomplish this we need only show that their covariance vanishes Thus

E[(Xi - Ili - rJ.'(x2 - 1l2))(X2 - 112)'] = E[(Xi - lli)(X2 - 112)']

- rJ.'E[(x2 - 1l2)(X2 - 112)'] = O"i' - rJ.'~22 (1.5.7)

As a result of (l.5.5), we conclude

(1.5.8) which establishes mutual independence

Now, let y be any vector of constants Then

Var (Xi _y'X2) = Var [(Xi - rJ.'x2) + (rJ - Y)'X2]

= Var (Xi - rJ.'x2) + (rJ - Y)'~22(rJ - y) (1.5.9) Because ~22 is positive definite, it follows that the left-hand side of (l.5.9) is minimized when the second term in the right-hand side is zero But this occurs only for

To complete our task, we prove

Q.E.D (1.5.10)

Lemma 10: Let X, 11, and ~ be as in Lemma 9 and partition them similarly; consider linear combinations y'x2, with y nonrandom Then the correlation between Xi and y'x2 is maximized for

PROOF: By Lemma 9, for any scalar c and vector y we have

Var (Xi - rJ.'x2) :s; Var (Xj - CY'X2)

Developing both sides, we have

O"ji - 20"j.rJ + rJ.'I: 22 rJ.:S; O"jj - 2cO"j.y + c2Y'~22Y

(*) E(x l I x 2) = ILl + ~12 ~2l(X2 - IL2)

(1.5.11 )

(1.5.12)

(1.5.13)

The matrix ~12 ~2l is called the matrix 0/ regression coefficients of Xl on x 2• Thus in (1.5.6)

a, ~2l is the vector o/regression coefficients of x, on x 2 •

Trang 36

Since c is arbitrary, (1.5.13) holds in particular for

Remark 11,' It follows from Lemma 10 that the multiple correlation coefficient between Xi and the vector X2 is given by

_ ai ex (ai ' :El l

a;.)-1-R i 'k+l,k+2, ,m = ( a ':E )-1- =

Here it would be interesting to point out the similarity between the properties

of the conditional density derived above and the classical general linear model

In the context of the latter, we are dealing with the sample

where y is T x 1, Z is T x n and represent, respectively, the (T) observations

on the dependent and explanatory variables; ~ is the vector of parameters to be estimated, and u is the vector of disturbances having the specification

In what follows, let us assume for convenience that the dependent and tory variables are measured as deviations from their respective sample means

explana-If ~ is estimated by least squares, then its estimator is given by

(1.5.19) and it is such that it minimizes

(1.5.20) Notice the similarity of this to the conclusion of Lemma 9 Note, also, the similarity between the expressions for ~ in (1.5.l9) and ex in (1.5.5)

The right-hand side in (1.5.l9) can be written as

(Z~Zrl Z/

Trang 37

But (Z'Z/T)-l and Z'y/Tare, respectively, the sample analogs of I:.il and O'i'

appearing in the right-hand side of (I.5.5)

Moreover,

(y - ZP)'Z = y'Z - P'Z'Z = y'Z - y'Z = 0 (1.5.21 ) which is again the sample analog of the result established in (1.5.8) of Lemma 9 Finally, the (unadjusted) coefficient of determination in multiple regression defined by

2 y'y - (y - ZP)'(y - ZP) P'Z'y y'Z(Z'Z)-IZ'y

is the exact sample analog of iPi'k+1 k+2, ,m established by (1.5.16), where

Xi corresponds to the dependent and X2 corresponds to the independent or explanatory variables

The purpose of this brief digression was to pinpoint these similarities and thus enhance the student's understanding of the general linear model in its various aspects The discussion also indicates how the study of the multivariate normal distribution contributes to understanding of the distributional problem involved in the general linear model

Otherwise this digression is completely extraneous to the development of this section and may be omitted without loss of continuity

The essential conclusions of this section are summarized in

Theorem 2: Let X '" N(Il, I:.); partition x by Xl, x 2, where Xl has k and

x 2 has (m - k) elements Partition 11 and I: conformally and let Xi be an element

of Xl Then the following statements are true

i The variance of Xi - a'x 2 is minimized for

(1.5.23) where 0' i is the ith row of I: 12 ; thus a is the vector of regression coefficients

of Xi on the elements of the vector x 2 •

ii Xi - a'x 2 is independent of x 2 - and hence of any linear combination of the elements of x 2 •

iii Let y be an arbitrary (nonrandom) vector; then the correlation between (the two scalar random variables) Xi and y'x2 is maximized for

Trang 38

1.6 ESTIMATORS OF THE MEAN VECTOR AND

COVARIANCE MATRIX AND THEIR DISTRIBUTION

Let x '" N(ll, 1:) and consider the random sample {x / : t = 1, 2, , T}, where

x / indicates the column vector whose elements are the tth observation on the

random vector x; thus x / is m x 1

The problem to be examined is that of obtaining estimators for 11 and 1:, the parameters of the density of x To solve the problem, we shall employ the principle of maximum likelihood

Now the likelihoodll of the T observations is given by

L* (x l' X 2 , , x T ; 11, 1:)

= (21t)- mTI2 11:1- TI2 exp [ - ~ ,t (x'/ - 11) '1:- 1 (x./ - 11)] (1.6.1) Put

and note that (1.6.1) can be written in the more convenient form

L*(x ,1, X 2, , X T ; 11,1:) = (21t)- mTI2 11:1- TI2 exp [ -t tr (Y'1:- 1 Y)] (1.6.3)

where Y is a (m x T) matrix whose tth column is y / Since for any two matrices

A, B, such that AB and BA are both defined, tr (A B) = tr (BA), the exponential

in (1.6.3) can be written-apart from the factor, t-as tr (1:-1 YY')

Let u be the T x 1 vector of units defined by

Trang 39

Adding and subtracting Txx' in the right-hand side of (1.6.6), we obtain

To obtain maximum likelihood estimators, we maximize (1.6.10) with respect

to II and :E Actually, in this case it is simpler (and equivalent) to maximize with respect to II and V

Notice, however, that II enters the likelihood function only through the exponential and, moreover, since

tr {V(x - ll)(X - 1-1)'} = tr {(x - I-1)'V(x - 1-1)} = (x - ll)'V(X - 1-1)12 (1.6.12)

we conclude, by the positive definiteness of V, that the likelihood function is maximized with respect to 1-1 if and only if (x - I-1)'V(x - 1-1) is minimized But the smallest value this quadratic form can assume is zero and this occurs only for

(1.6.13) Thus our earlier manipulations have spared us the need of differentiating (1.6.10) with respect to ll To complete the estimation problem, we need only differenti-

ate with respect to V the expression resulting in (1.6.10) when we substitute

therein the maximum likelihood estimator of II given in (1.6.13) Making the substitution, we obtain

L(X.1' X.2'.··' X'T;:E) = -2- ln (2n) + '2 In IVI-2tr VA (1.6.14) where

Trang 40

where Vij is the cofactor of vij and I V I is the determinant of V In matrix form,

the equations (1.6.17) read

Lemma11: Let x"" N(Il,~) and let {x t : t = 1,2, , T} be a random sample on the vector x Then the maximum likelihood estimators of Il and ~

are given, respectively, by

ft=i

~ XX' - Tii'

~ =

-:T,, (1.6.20) (1.6.21 ) Before turning to the problem of determining the distribution of the maximum likelihood estimators just obtained, we cite, without proof, the following useful

Theorem 3: Let s be a (column) vector such that

s's = I

but is otherwise arbitrary

Then there exists an orthogonal matrix13 having s as its last column

The distribution of ft and ~ is established by

(1.6.22)

Lemma 12: Let x "" N(Il, ~) and consider the sample {x t t = I, 2, , T}

Let X be the matrix having x t as its tth column Let B be a TxT orthogonal matrix having for its Tth column14

Định dạng
Số trang	604
Dung lượng	19,21 MB