PROBABILITY LIMITS, ASYMPTOTIC DISTRIBUTIONS, AND PROPERTIES OF MAXIMUM LIKELIHOOD ESTIMATORS 3.4 Central Limit Theorems and Related Topics 100 3.5 Miscellaneous Useful Convergence Re
Trang 1ECONOMETRICS
Trang 2economics
Trang 3ECONOMETRICS STATISTICAL FOUNDATIONS
Trang 4Library of Congress Cataloging in Publication Data
Dhrymes, Phoebus J
1932-Econometrics: statistical foundations and
applica-tions
Corrected reprint of the 1970 ed published by
Harper & Row, New York
1 Econometrics I Title
[HB139.D48 1974] 330'.01'8 74-10898
Second printing: July, 1974
First published 1970, by Harper & Row, Publishers, Inc
Design: Peter Klemke, Berlin
All rights reserved
No part of this book may be translated or reproduced in any form without written permission from Springer-Verlag
© 1970 by Phoebus J Dhrymes and 1974 by Springer-Verlag New York Inc
ISBN-13:978-0-387-90095-7 e-ISBN-13:978-1-4613-9383-2 001: 10.1007/978-1-4613-9383-2
Trang 5PREFACE TO SECOND PRINTING
The main difference between this edition by Springer-Verlag and the earlier one by Harper & Row lies in the elimination of the inordinately high number of misprints found in the latter A few minor errors of exposition have also been eliminated The material, however, is essentially similar to that found in the earlier version
I wish to take this opportunity to express my thanks to all those who pointed out misprints to me and especially to H Tsurumi, Warren Dent and J D Khazzoom
New York
February, 1974
PHOEBUS J DHRYMES
Trang 6PREFACE TO FIRST PRINTING
This book was written, primarily, for the graduate student in metrics Its purpose is to provide a reasonably complete and rigorous exposition of the techniques frequently employed in econometric research, beyond what one is likely to encounter in an introductory mathematical statistics course It does not aim at teaching how one can do successful original empirical research Unfortunately, no one has yet discovered how to communicate this skill impersonally Practicing econometricians may also find the integrated presentation
econo-of simultaneous equations estimation theory and spectral analysis a convenient reference
I have tried, as far as possible, to begin the discussion of the various topics from an elementary stage so that little prior knowledge of the subject will be necessitated It is assumed that the potential reader is familiar with the elementary aspects of calculus and linear algebra Additional mathematical material is to be found in the Appendix Statistical competence, approximately at the level of a first-year course in elementary mathematical statistics is also assumed on the part of the reader
The discussion, then, develops certain elementary aspects of variate analysis, the theory of estimation of simultaneous equations systems, elementary aspects of spectral and cross-spectral analysis, and shows how such techniques may be applied, by a number of examples
multi-It is often said that econometrics deals with the quantification of economic relationships, perhaps as postulated by an abstract model
Trang 7As such, it is a blend of economics and statistics, both presupposing a stantial degree of mathematical sophistication Thus, to practice econometrics compentently, one has to be well-versed in both economic and statistical theory Pursuant to this, I have attempted in all presentations to point out clearly the assumptions underlying the discussion, their role in establishing the conclusions, and hence the consequence of departures from such assumptions Indeed, this
sub-is a most crucial aspect of the student's training and one that sub-is rather frequently neglected This is unfortunate since competence in econometrics entails, inter alia, a very clear perception of the limitations of the conclusions one may obtain from empirical analysis
A number of specialized results from probability theory that are crucial for establishing, rigorously, the properties of simultaneous equations estimators have been collected in Chapter 3 This is included only as a convenient reference, and its detailed study is not essential in understanding the remainder of the book It is sufficient that the reader be familiar with the salient results presented
in Chapter 3, but it is not essential that he master their proof in detail I have used various parts of the book, in the form of mimeographed notes, as the basis
of discussion for graduate courses in econometrics at Harvard University and, more recently, at the University of Pennsylvania
The material in Chapters I through 6 could easily constitute a one-semester course, and the remainder may be used in the second semester The instructor who may not wish to delve into spectral analysis quite so extensively may include alternative material, e.g., the theory of forecasting
Generally, I felt that empirical work is easily accessible in journals and similar publications, and for this reason, the number of empirical examples is small By now, the instructor has at his disposal a number of pUblications on econometric models and books of readings in empirical econometric research, from which he can easily draw in illustrating the possible application of various techniques
J have tried to write this book in a uniform style and notation and preserve maximal continuity of presentation For this reason explicit references to individual contributions are minimized; on the other hand, the great cleavage between the Dutch and Cowles Foundation notation is bridged so that one can follow the discussion of 2SLS, 3SLS, and maximum likelihood estimation in a unified notational framework Of course, absence of references from the dis-cussions is not meant to ignore individual contributions, but only to insure the continuity and unity of exposition that one commonly finds in scientific, mathe-matical, or statistical textbooks
Original work relevant to the subject covered appears in the references at the end of each chapter; in several instances a brief comment on the work is inserted This is only meant to give the reader an indication of the coverage and does not pretend to be a review of the contents
Finally, it is a pleasure for me to acknowledge my debt to a number of
Trang 8individuals who have contributed directly or indirectly in making this book what it is
I wish to express my gratitude to H Theil for first introducing me to the rigorous study of econometrics, and to I Olkin from whose lucid lectures I first learned about multivariate analysis T Amemiya, L R Klein, J Kmenta,
B M Mitchell, and A Zellner read various parts of the manuscript and offered useful suggestions V Pandit and A Basu are chiefly responsible for compiling the bibliography Margot Keith and Alix Ryckoff have lightened my burden by their expert typing
PHOEBUS J DHRYMES
January, 1970
Trang 91.4 The Multivariate Normal Distribution 12
1.5 Correlation Coefficients and Related Topics 20
Trang 103
4
5
PROBABILITY LIMITS, ASYMPTOTIC
DISTRIBUTIONS, AND PROPERTIES OF
MAXIMUM LIKELIHOOD ESTIMATORS
3.4 Central Limit Theorems and Related Topics 100
3.5 Miscellaneous Useful Convergence Results 110
3.6 Properties of Maximum Likelihood (ML) Estimators 114 3.7 Estimation for Distribution Admitting of Sufficient Statistics 130 3.8 Minimum Variance Estimation and Sufficient Statistics 136
ESTIMATION OF
SIMULTANEOUS EQUATIONS SYSTEMS
4.1 Review of Classical Methods 145
4.2 Asymptotic Distribution of Aitken Estimators 161
4.3 Two-Stage Least Squares (2SLS) 167
4.4 2SLS as Aitken and as OLS Estimator 183
4.5 Asymptotic Properties of 2SLS Estimators 190
4.6 The General k-Class Estimator 200
4.7 Three-Stage Least Squares (3SLS) 209
APPLICATIONS OF CLASSICAL AND
SIMULTANEOUS EQUATIONS TECHNIQUES
AND RELATED PROBLEMS
5.3 An Example of 2SLS and 3SLS Estimation 236
5.4 Measures of Goodness of Fit in Multiple Equations Systems: Coeficient of (Vector) Alienation and Correlation 240
5.5 Canonical Correlations and Goodness of Fit in Econometric Systems 261
5.6 Applications of Principal Component Theory In Econometric Systems 264
5.7 Alternative Asymptotic Tests of Significance for 2SLS Estimated Parameters 272
Trang 116.2 Indirect Least Squares (ILS) 279
6.3 The Identification Problem 289
6.4 Instrumental Variables Estimation 296
6.5 Recursive Systems 303
MAXIMUM LIKELIHOOD METHODS
7.1 Formulation of the Problem and Assumptions 314
279
314
7.2 Reduced Form (RF) and Full Information Maximum Likelihood (FIML) Estimation 316
7.3 Limited Information (LlML) Estimation 328
RELATIONS AMONG ESTIMATORS;
MONTE CARLO METHODS
8.1 Introduction 358
8.2 Relations Among Double k-Class Estimators 359
8.3 I.V., ILS, and Double k-Class Estimators 364
8.4 Limited Information Estimators and Just Identification 365 8.5 Relationships Among Full Information Estimators 367
8.6 Monte Carlo Methods 372
SPECTRAL ANALYSIS
9.1 Stochastic Processes 382
358
382
9.2 Spectral Representation of Covariance Stationary Series
9.3 Estimation of the Spectrum 419
444 Spectrum, and
10.4 An Empirical Application of Cross-Spectral Analysis 479
Trang 1211
12
APPROXIMATE SAMPLING DISTRIBUTIONS
AND OTHER STATISTICAL ASPECTS
OF SPECTRAL ANALYSIS
11.1 Aliasing 485
11.2 " Prewhitening," "Recoloring," and Related Issues 488
11.3 Approximate Asymptotic Distributions; Considerations of Design and Analysis 492
APPLICATIONS OF SPECTRAL ANALYSIS
TO SIMULTANEOUS EQUATIONS SYSTEMS
12.1 Generalities 507
12.2 Lag Operators 509
12.3 An Operator Representation of the Final Form 517
12.4 Dynamic Multipliers and the Final Form 521
12.5 Spectral Properties of the Final Form 525
12.6 An Empirical Application 533
485
507
A.l Complex Numbers and Complex-Valued Functions 545
A.2 The Riemann-StieItjes Integral 551
A.3 Monotonic Functions and Functions of Bounded Variation 552 A.4 Fourier Series 557
A.5 Systems of Difference Equations with Constant Coefficients 567 A.6 Matrix Algebra 570
Trang 13ECONOMETRICS
Trang 14state-in the state-interval (ex, ~) and so on
Still, in this elementary context the correlation of two variables was introduced and interpreted as a measure of the degree to which the two variables tend to move in the same or opposite direction
In econometric work, however, it is often necessary to deal with a number of relations simultaneously Thus we typically have an econometric model containing more than one equation Such a system may be simultaneously determined in the sense that the interaction of all variables as specified by the model determines simultaneously the behavior of the entire set of (jointly) dependent variables
Generally, the equations of an econometric model are, except for identities, stochastic ones and hence the problem arises of how to specify the (joint) stochastic character of a number of random vari-ables simultaneously This leads us to consider the problem of the distribution of vector random variables, that is, the characteristics
Trang 15of the joint distribution of a number of random variables simultaneously and not" one at a time."
When the problem is considered in this context, it is apparent that there are more complexities than in the study of the distribution of scalar random vari-ables Therefore, if we are dealing with the (joint) distribution of m variables,
we might wish to say something about the distribution of a subset of k (k < m)
variables given the (m - k) remaining ones
This is a problem that cannot arise in the study of univariate distributions
In the following material we shall study in some detail certain elementary aspects of multivariate distributions, confining ourselves to the special case of the multivariate normal distribution
Let us now set forth the notational framework and conventions for this topic and obtain some simple but useful results
Definition 1: Let {xij: i = 1,2, , m,j = 1,2, , n} be a set of random variables Then the matrix
Definition 2: Let Z = (Zl' Z2, , zm)' be a random vector; then the
covariance matrix of Z is defined by
and as a matter of notation we write
Trang 16This is, of course, not a universal practice; it is, however, used quite widely Notice, further, that in (1.1.5) (z - E(z))(z - E(z))' is an m x m random matrix, and thus the meaning of the expectation operator there is given by (1.1.2) of
In general, we shall identify vectors and matrices by their dimensions Thus the statement" x is m xl" will mean that x is a column vector with m elements,
while "A is m x n" will mean that A is a matrix with m rows and n columns
A simple consequence of the preceding is
Lemma 1: Let x be m x 1 and random; suppose that
Cov (y) = A~A'
PROOF: By definition, the ith element of y is given by
Cov (y) = E[(y - E(y))(y - E(y))'] = E[A(x - ~)(x - ~)' A']
The (i,j) element of (x - ~)(x - ~)' is given by
x~ = (Xi - ~i)(Xj - ~j)
(1.1.8)
(1.1.9)
(1.1.10) (1.1.11)
Trang 17(1.1.20) Before considering the multivariate normal distribution in some detail, we should point out two useful facts One, implicit in the preceding, is this: if X is
a random matrix and if A, B, C are conformable nonrandom matrices, then
E(Y) = AE(X)B + C
where
Y=AXB+C
The second is given by
Lemma 2: Let x be m x 1, random; then
~ = Cov (x)
is at least a positive semidefinite, symmetric matrix
PROOF: The symmetry of~ is obvious; thus its (i,j) element is
CJij = E[(Xi - Il;)(x j - 11)]
while its (j, i) element is
CJji = E[(xj - 1l)(Xi - Ili)]
(1.1.21) (1.1.22)
Trang 18From elementary mathematical statistics we know that the variance of y is
nonnegative Thus
o ~ Var (y) = E[ex'(x - J.1)(x - J.1),ex] = ex'l:ex Q.E.D
(1.1.27)
Remark 2: Notice that if l: is not strictly positive definite (i.e., if it is a
singular matrix), then there will exist a nonnull constant vector, say y, such that
(1.1.28)
This means that there exists a scalar random variable, say y, which is a linear combination of the elements of x and whose variance is zero The latter fact means that y is a constant Hence a singular covariance matrix l: in (1.1.27) implies that the elements of x are linearly dependent in the sense that there exists
a non null set of constants (Yl, Y2' Y3' , Ym) such that y = Ii'!: lYiXi is
non-random
If this is so, then the distribution of the random vector x is said to be
degenerate
If this is not so-that is, if l: is strictly positive definite, symmetric-then the
distribution of x is said to be proper In this textbook we shall only deal with proper distributions; thus the term "proper" will be suppressed everywhere except when the context clearly requires it
1.2 JOINT, MARGINAL,
AND CONDITIONAL DISTRIBUTIONS
In this section we shall define what we mean by the joint distribution of a number of random variables, derive certain features of it, and study its associated marginal and conditional distributions The following convention, which is often employed in mathematical statistics, will be adhered to
Convention: We shall denote random variables by capital Roman letters.1
We shall denote by lowercase letters the values assumed by the random variables
Thus Pr {X ~ x} indicates the probability that the random variable X will
assume a value equal to or less than (the real number) x
Definition 3: Let X be m x I, random; by its joint ( cumulative) tion function we mean a function F( " " , ) such that
distribu-i 0 ~ F ~ 1
ii F is monotonic nondecreasing in all its arguments
iii F(-oo, -00, , -(0) =0 F(oo, 00, , (0) = 1
iv Pr {X 1 ~ Xl' X2 ~ x 2 , ••• , Xm ~ X m } = F(Xl' X2, •.• , X m)
1 Recall that a random variable is a real-valued junction defined on the relevant sample space
Trang 19In this textbook we shall always assume that F( " ) is absolutely continuous
so that the derivative
exists almost everywhere
Therefore we have
Definition 4: Let F(·,·, ) be the joint cumulative distribution function
of the m x 1 random variable X Suppose that F is absolutely continuous; then
(1.2.1 )
is said to be the joint density function of (the elements of) X
In the following material we shall always assume that the density function exists
Remark 2: It is clear from (1.2.1) and statement iv of Definition 3 that
Trang 20Remark 4: It should also be clear that the marginal density of any element
of X, say Xl' is given by
(1.2.6) and is simply the density function of Xl as studied in elementary mathematical statistics courses
The marginal density of a subset of the elements of X simply characterizes their probability structure after the effects of all other variables have been allowed for (" averaged out" or "integrated out ") In particular, notice that the marginal density of X l' X 2 , • , X k does not depend on Xk+ l' Xk+2' , X m
In contradistinction to this, we have another associated density, namely, the conditional one
Recall from elementary probability that if A and B are two events, then the
conditional probability of A given B, denoted by peA I B), is defined as
g(.) is the marginal density of X2 andf(·) is the density of X = (~~)
Remark 5: As the notation in (1.2.8) makes clear, the conditional density
of Xl given X2 does depend on X2
Whereas in the case of the marginal density of Xl the effects of the variables
in X2 were "averaged" or "integrated" out, in the case of the conditional density of Xl the effects of the variables in X2 are allowed for explicitly by
"holding them constant." The meaning of this distinction will become clearer when we study the multivariate normal distribution
Moments are defined with respect to the densities above in the usual way Thus let h(·) be a function of the random variable X The expectation of heX)
is defined by
Trang 21where the integral sign indicates the m-fold integral with respect to
Xl' X2 , ••• , x m •
If it is specified that h( ) depends only on Xl, then we can define two
expecta-tions for h(X1), one marginal and one conditiona1
The marginal expectation of h(X1) is defined by
(1.2.11) The conditional expectation of Xl given X2 is defined by
(1.2.12) Notice that in (1.2.11) we expect with respect to the marginal density of xl,
while in (1.2.12) we expect with respect to the conditional density of Xl given X2
Example 1: Suppose that we wish to obtain the mean of one of the elements
of X, say X l' In this case, take the function h( ) as
to the left of the dot indicates the random variable being expected, while the numbers appearing to the right of the dot indicate the conditioning variables Thus )12'1.3,4, ,m indicates the conditional mean of X2 given Xl' X 3 ,
X 4 , ••• , X m • Of course, we can define quantities such as )11 5.6,7, , m which would indicate the conditional mean of Xl given X 5' X 6, •• , X m • What we mean by this is the following: Obtain the marginal density of Xl' X 5 ,
density, determine the conditional density of X 1 given X 5 , X 6, ••• , Xmas
Finally, expect Xl with respect to I(X 1 I X5 , , Xm)
2 The term" marginal" is usually omitted; one speaks only of the mean of Xl
Trang 22Example 2: Suppose that h( ) is such that is depends only on two variables
Thus, say,
The marginal3 expectation of heX) is given by
E[h(X)] = f (x, -1l,)(X2 -1l2)f(x) dx = 0"12 (1.2.18) The expectation here simply yields the covariance between the first and second elements of X
As before, we can again define the conditional covariance between X, and
X 2 given X 3 , X 4 , , X m Hence we have
We leave it to the reader to compute a number of different conditional variances and covariances
The preceding discussion should be sufficient to render the meaning of the notation, say O"SS'I,7,12,13, ,m or 0"77",2,'9,20,2', ,m, quite obvious
Finally; let us conclude with
Definition 7: Let X be m x 1 and random; then its elements are said to
be mutually (statistically) independent if and only if their (joint) density can be expressed as the product of the marginal densities of the individual elements
Remark 6: Suppose X is partitioned by Xl and X 2 as above; then Xl and
X 2 are said to be mutually independent if and only if the joint density of X can
be expressed as the product of the marginal densities of Xl and X2
We shall now abandon the convention whereby random variables and the values assumed by them are distinguished by the use of capital and lowercase letters respectively Henceforth no such distinction will be made The meaning will usually be clear from the context
Trang 23be a transformation of En into itself Thus
where the ith element of Y is the function
and x and yare not necessarily random
Suppose that the inverse transformation also exists; that is, suppose there
exists a function g( ) such that
-°Yl OY2 °Yn
oX2 OX2 OX2
I oXi 1= °Yl OY2 °Yn
oXn oXn oXn
-°Yl °Y2 °Yn
and thus it is expressed solely in terms of the Yi' i = 1, 2, , n
Suppose now that x is random, having density f(·), and consider the problem
of determining the density of y in terms off(') and the transformation in (1.3.1)
To this effect, we prove:
Trang 24Moreover, suppose that h(·) and g(.) are differentiable Then the density,
<1>( ), of y is given by
where 111 is the absolute value of the Jacobian of the transformation.4
PROOF: The cumulative distribution of x is given by
(1.3.9) Notice that F(x) in (1.3.9) gives the probability assigned by f(·) to the set
(1.3.10) which is the Cartesian product of the intervals (-00, x;) i = 1,2, , m This accounts for the notation employed in the last member of (1.3.9) Now, if in (1.3.9) we make the transformation
where B is the transform of A under h
The integral in (1.3.13) gives the probability assigned to the set B by the
functionf[g(·)] Ill Moreover, the set B is of the form
(1.3.14) and corresponds to the" joint event"
4 We must add, of course, the restriction that J does not vanish on every (nondegenerate)
subset of Em We should also note that in standard mathematical terminology J is the inverse
of the Jacobian of (1.3.6) In the statistical literature, it is referred to as the Jacobian of (1.3.6)
We shall adhere to this latter usage because it is more convenient for our purposes
5 The validity of this representation follows from the theorems dealing with change of variables in multiple integrals See, for example, R C Buck, Advanced Calculus, p 242, New York, McGraw-Hill, 1956
Trang 25where Yi indicates the random variable and Yi the yalues assumed by it Since
the integrand of (1.3.13) is nonnegative and its integral over the entire space is unity, we conclude that
is the joint density of the elements of y Q.E.D
1.4 THE MULTIVARIATE NORMAL DISTRIBUTION
It is assumed that the reader is familiar with the univariate normal distribution This being the case, perhaps the simplest way of introducing the multivariate normal distribution is as follows
Let XI: i = 1,2, , m be random variables identically and
indepen-dently distributed as N(O, I); that is, they are each normal with mean
zero and unit variance Since they are independent, the density of the vector
q,(y) =f[A-I(y - b)JIJI
= (21t)-m/2IAI-I exp [-t(y - b)'A,-IA-I(y - b)] (1.4.4) For notational simplicity, we have assumed in (1.4.4) that IAI > 0 We know that
and thus, from Lemma 1, we conclude that
To conform to standard usage, put as a matter of notation
Trang 26and rewrite (1.4.4) in standard form as
(1.4.8) Thus we define the multivariate normal by (1.4.8) More formally,
Definition 9,' Let y be m x 1 random; then y is said to have the variate normal distribution with mean 11 and covariance matrix L, denoted by
multi-if and only multi-if the joint density of its elements is given by (1.4.8)
Before establishing certain properties of the normal distribution, let us introduce
Definition 10,' The characteristic function of a random vector x is given by
where i has the property i 2 = - I and t is a vector of arbitrary real constants
Remark 7,' The usefulness of the characteristic function is that it is always defined for random variables possessing densities; moreover, there is one-to-one correspondence between characteristic and density functions Thus if f(·) is the density of x, then explicitly we have
which is merely the Fourier transform of f(·)
Hence if we know the characteristic function of a random variable, we can,
in principle, determine its density function by inverting (1.4.10) We have inserted" in principle" advisedly, for it is not always simple or possible to
determinef(·) from \)J(') explicitly
Finally, note that
Trang 27denote, respectively, the kth moment of Xj and the cross moment of Xj and x.,
Lemma 4: The characteristic function of x ~ N(~, I) is given by
PROOF: Let YI' Y2, Y3, , Ym be independently distributed as N(O, 1) It can
be shown that the characteristic function of Yj is given by
(1.4.14) Since the Yj' j = 1, 2, , m, are mutually independent, their joint characteristic
Then x has what we have termed the multivariate normal distribution with mean
~ and covariance matrix I
Using (1.4.16) and (1.4.17), we have
exp (-1r'r) = E[exp (ir'y)] = E{ exp [ir'A -lex - ~)]}
= E[exp (ir'A-Ix)' exp (-ir' A-l~)]
Put
(1.4.18)
(1.4.19) Since r is an arbitrary vector of constants, then so is t Substituting (1.4.19) in (1.4.18) and rearranging terms, we obtain, as a result of(1.4.17),
exp (it'~ - 1t'It) = E[exp (it' x)] = \)Jx(t) Q.E.D
(1.4.20)
Trang 28Remark 8: Notice that the characteristic function of the multivariate normal is an exponential containing a linear and a quadratic term in t, the
parameter defining the characteristic function The coefficients in the linear terms are simply the elements of the mean vector Jl, while the matrix of the quadratic form in (1.4.20) is the covariance matrix of the distribution We next prove some of the fundamental properties of the normal distribution
lemma 5: Let x '" N(Jl, r) and write
where B is m x m, b is m x I, the elements of both being (nonrandom)
con-stants Then
y '" N(BJl + b, BrB')
PROOF: The characteristic function of y is, by definition,
"'y(t) = E[exp (it'y)]
Let
s = B't
and, using (1.4.21), note that
E[exp (it'y)] = E[exp (it'Bx)] exp (it'b) = E[exp (is'x)] exp (it'b)
(1.4.22) (1.4.23) (1.4.24)
= exp (is'Jl- -!s'rs + it'b) = exp [it'(BJl + b) - -!t'B1:B't]
Trang 29Partition T conformably with ~ so that
By Lemma 5, y ~ N(O, 1m), where 1m is the identity matrix of order m Thus the
Yj :j = 1,2, , m are mutually independent and each is distributed as N(O, 1)
It follows, therefore, that yl ~ N(O, Ik)' By (1.4.31) we have
Q.E.D
Corollary: The marginal density of the ith element of x is normal with mean
Ili and variance O'ii'
PROOF: In the proof of Lemma 6, take
(1.4.35)
by rearranging the elements of x, 11, and ~ so that the first element of x* is Xi'
the first element of 11* is Ili' and the (1, I) element of ~* is O'u, where x*, 11*,
and ~* represent the vectors and matrix after rearrangement The conclusion of the corollary follows immediately, for
(1.4.36)
Lemma 7: Let x ~ N(Il, ~) and partition x, 11, and ~ as in Lemma 6 Then the conditional distribution of Xl given Xl is
(1.4.37) PROOF: By Lemma 6, Xl ~ N(ll l , ~l2)' By definition, the conditional density
of Xl given Xl is the quotient of the joint density of Xl and Xl to the marginal density of x 2 • Thus
1 2 (2n)-mI21~I-t exp [-t(x _1l)'~-l(X -11)]
hex I x ) = (2n)-<m-k)/2 I~d-+ exp [-t(x 2 _ 1l2)'~2/(X2 - 112)] (1.4.38)
Trang 30H(xl -1l 1 ),V 11 (X I -Ill) + 2(xl -1l1),VI2 (X 2 -11 2)
+ (x 2 - 1l2),V22(X 2 - 112) - (x 2 - 1l2)'(V22 - V21 V1/ V12)(X2 - 1l2)}
= H[xl - (Ill - Vl/VuCx2 -1l2)],Vl1 [X I - (Ill - V1/V12(x 2 - 1l2)]}
(1.4.43) The determinantal expressions in (1.4.38) yield, in view of (1.4.41),
11:22I!I1:I-! = 11:11 - 1:121:2"i1:21 1-+
Moreover, 7
V 11 = (1:11 - 1:12 1:2"l 1:21)-1
Hence (1.4.38) may be rewritten
h(xl ! x 2) = (2n)-k/2! Vll !+ exp {_1[xl - (Ill x:E12 :E2i(x2 -112»],
VIl[XI - (Ill +:El2 1:2 i(x 2 - 1l 2 »)]}
6 This is easily established as follows Let
(*) D = [~ -~12 ;11]
Then
(**) D~ = [~11 -~~:2 ~11~21 ~~J
Since I DI = I, the result in (1.4.41) follows immediately from (**)
7 The relations in (1.4.42) and (1.4.45) result from
(1.4.44 ) (1.4.45)
Q.E.D (1.4.46)
(*) [~11 ~12]-1 =[ (~11-~12~2,t~21)-1 -~111~12(:E22-:E21:El/:E12)-1]
~21 ~22 -:E21~21(:Ell-:E12:E2l:E2d-l (:E22-:E21~i?:E12)-1
which can be verified directly and their verification is thus left as an exercise for the reader
Trang 31Remark 9: One important aspect of the preceding result is that the ditional mean of Xl given X2 is a linear function of X2, while the covariance matrix of its conditional distribution is independent of X2
con-Finally, it would be desirable to have a simple test as to whether two sets of normal variables are mutually independent Thus
Lemma 8: Let x ~ N(Jl, ~) and let x, Jl, and ~ be partitioned as in Lemma 6 Then Xl and X2 are mutually independent if and only if
Suppose that Xl and x 2 are mutually independent and distributed, ively, as Xl ~ N(Jll, ~ll)' x 2 ~N(Jl2, ~22)' Let Xi be an element of Xl and Xj
Trang 32It follows by (1.4.53) that
(1.4.55) These fundamental properties of the normal distribution are summarized in
Theorem 1: Let x '" N(~, E) and partition x by x = G~) such that Xl
has k elements and x 2 (m - k) elements Partition ~ and E conformally Then the following statements are true
i The characteristic function of x is given by
~(t) = exp (it'~ - t t'Et)
iv The conditional distribution of Xl given x 2 is
Xl I X2 '" N[~ 1 + E12 E2}(x 2 - ~2), Ell - E12 E2"1E 21 ]
v The sub vectors Xl and x 2 are mutually independent if and only if
E12 = 0
PROOF: See Lemmas 4, 5, 6, 7, and 8
Before we leave this topic, let us establish a remarkable fact about the normal distribution
Proposition 1: Let x be a random vector such that
If every (nontrivial) linear combination of the elements of x is normally buted, then
distri-(1.4.57) PROOF: Let a be an arbitrary vector of constants and define
Trang 33Then y is a scalar random variable and, by the hypothesis of the proposition, is
normally distributed, say with mean v and variance 0'2 Its characteristic function
is thus
E[exp (isy)] = exp (isv - ts20'2)
where s is an arbitrary (real) scalar But
E(y) = ex.'1l Var (y) = ex.'~ex
Putting
t = sex
and using (1.4.58) and (1.4.59), we conclude
E[exp (it'x)] = exp (it'll- !t'~t)
which shows that x "" N(Il, ~) Q.E.D
1.5 CORRELATION COEFFICIENTS
AND RELATED TOPICS
(1.4.59) (1.4.60) (1.4.61 ) (1.4.62)
In this section we shall give the definition of several types of correlation efficients and show the similarity between some aspects of a conditional normal density and the general linear model
where y is T x 1, X is T x n and they refer, respectively, to the observations
on the dependent and explanatory variables The vector ~ consists of the unknown parameters to be estimated and u is the T x 1 vector of disturbances, which is typically assumed to have mean zero and covariance matrix 0'21
Definition 11: Let x'" N(Il, ~); then the (simple) correlation coefficient
between two elements of x, say Xi and x j ' is defined by
O'ij
p = :=====
IJ JO'HO'jj
(1.5.2) Thus it is the correlation coefficient in the (marginal) joint density of Xi and x j • 8
Definition 12: Let x "" N(Il, ~) and partition x by Xl and X2 so that Xl has
k and X2 (m - k) elements If Xi' Xj are any two elements of Xl, then their partial correlation coefficient (for fixed Xk+ 1 , Xk+ 2 , ••• , xm) is defined by
Trang 34where O'jj.k+l k+2 m is the (i,j) element of the covariance matrix in the conditional density of Xl given X2 In the present case (normal distribution), this matrix is 1:11 -1:121:2"l1:21
Hence partial correlation coefficients are simply correlation coefficients computed with respect to the (conditional) joint density of the variables in question given another group of variables The variables "held constant" are
en numerated after the dot, in the suggestive notation of (1.5.3)
Remark 10: The difference betwen a simple and a partial correlation efficient is this: a simple correlation coefficient between Xj and Xj expresses the degree of relation between these variables, when the effects of all other variables have been averaged out A partial correlation, however, expresses the degree
co-of relation between Xj and Xj for given values of all other (relevant) variables
In the normal case, this aspect is obscured because, by iv of Theorem 1, the covariance matrix of the conditional distribution does not depend on the con-ditioning variables Notice, however, that in (1.5.3) Pij.k+l.k+2 • m depends
on 1:12 and 1:22 , which contain the covariance parameters of the conditioning variables
Finally, it should be pointed out that if Xl and X2 are mutually independent (in the normal case if 1:12 = 0), then
Definition 13: Let x '" N(~, 1:) and partition as in Definition 12 Let a,'X2
be a linear combination of the elements of X2 and let Xj be an element of Xl
Then the maximum correlation9 between Xi and the linear combination a,'X2 is called the multiple correlation between Xi and the vector X2 and is denoted by
9 The term maximum is needed here for the correlation between the two scalar random
variables XI and IX'X2 will depend on the arbitrary vector of constants IX
Trang 35PROOF: Notice that the conditional expectation of Xi given X2 is10
(1.5.6)
We first show that X2 and Xi - rJ.'x2 are mutually independent Because
Xi - rJ.'x2 and X2 are (jointly) normally distributed, to accomplish this we need only show that their covariance vanishes Thus
E[(Xi - Ili - rJ.'(x2 - 1l2))(X2 - 112)'] = E[(Xi - lli)(X2 - 112)']
- rJ.'E[(x2 - 1l2)(X2 - 112)'] = O"i' - rJ.'~22 (1.5.7)
As a result of (l.5.5), we conclude
(1.5.8) which establishes mutual independence
Now, let y be any vector of constants Then
Var (Xi _y'X2) = Var [(Xi - rJ.'x2) + (rJ - Y)'X2]
= Var (Xi - rJ.'x2) + (rJ - Y)'~22(rJ - y) (1.5.9) Because ~22 is positive definite, it follows that the left-hand side of (l.5.9) is minimized when the second term in the right-hand side is zero But this occurs only for
To complete our task, we prove
Q.E.D (1.5.10)
Lemma 10: Let X, 11, and ~ be as in Lemma 9 and partition them similarly; consider linear combinations y'x2, with y nonrandom Then the correlation between Xi and y'x2 is maximized for
PROOF: By Lemma 9, for any scalar c and vector y we have
Var (Xi - rJ.'x2) :s; Var (Xj - CY'X2)
Developing both sides, we have
O"ji - 20"j.rJ + rJ.'I: 22 rJ.:S; O"jj - 2cO"j.y + c2Y'~22Y
(*) E(x l I x 2) = ILl + ~12 ~2l(X2 - IL2)
(1.5.11 )
(1.5.12)
(1.5.13)
The matrix ~12 ~2l is called the matrix 0/ regression coefficients of Xl on x 2• Thus in (1.5.6)
a, ~2l is the vector o/regression coefficients of x, on x 2 •
Trang 36Since c is arbitrary, (1.5.13) holds in particular for
Remark 11,' It follows from Lemma 10 that the multiple correlation coefficient between Xi and the vector X2 is given by
_ ai ex (ai ' :El l
a;.)-1-R i 'k+l,k+2, ,m = ( a ':E )-1- =
Here it would be interesting to point out the similarity between the properties
of the conditional density derived above and the classical general linear model
In the context of the latter, we are dealing with the sample
where y is T x 1, Z is T x n and represent, respectively, the (T) observations
on the dependent and explanatory variables; ~ is the vector of parameters to be estimated, and u is the vector of disturbances having the specification
In what follows, let us assume for convenience that the dependent and tory variables are measured as deviations from their respective sample means
explana-If ~ is estimated by least squares, then its estimator is given by
(1.5.19) and it is such that it minimizes
(1.5.20) Notice the similarity of this to the conclusion of Lemma 9 Note, also, the similarity between the expressions for ~ in (1.5.l9) and ex in (1.5.5)
The right-hand side in (1.5.l9) can be written as
(Z~Zrl Z/
Trang 37But (Z'Z/T)-l and Z'y/Tare, respectively, the sample analogs of I:.il and O'i'
appearing in the right-hand side of (I.5.5)
Moreover,
(y - ZP)'Z = y'Z - P'Z'Z = y'Z - y'Z = 0 (1.5.21 ) which is again the sample analog of the result established in (1.5.8) of Lemma 9 Finally, the (unadjusted) coefficient of determination in multiple regression defined by
2 y'y - (y - ZP)'(y - ZP) P'Z'y y'Z(Z'Z)-IZ'y
is the exact sample analog of iPi'k+1 k+2, ,m established by (1.5.16), where
Xi corresponds to the dependent and X2 corresponds to the independent or explanatory variables
The purpose of this brief digression was to pinpoint these similarities and thus enhance the student's understanding of the general linear model in its various aspects The discussion also indicates how the study of the multivariate normal distribution contributes to understanding of the distributional problem involved in the general linear model
Otherwise this digression is completely extraneous to the development of this section and may be omitted without loss of continuity
The essential conclusions of this section are summarized in
Theorem 2: Let X '" N(Il, I:.); partition x by Xl, x 2, where Xl has k and
x 2 has (m - k) elements Partition 11 and I: conformally and let Xi be an element
of Xl Then the following statements are true
i The variance of Xi - a'x 2 is minimized for
(1.5.23) where 0' i is the ith row of I: 12 ; thus a is the vector of regression coefficients
of Xi on the elements of the vector x 2 •
ii Xi - a'x 2 is independent of x 2 - and hence of any linear combination of the elements of x 2 •
iii Let y be an arbitrary (nonrandom) vector; then the correlation between (the two scalar random variables) Xi and y'x2 is maximized for
Trang 381.6 ESTIMATORS OF THE MEAN VECTOR AND
COVARIANCE MATRIX AND THEIR DISTRIBUTION
Let x '" N(ll, 1:) and consider the random sample {x / : t = 1, 2, , T}, where
x / indicates the column vector whose elements are the tth observation on the
random vector x; thus x / is m x 1
The problem to be examined is that of obtaining estimators for 11 and 1:, the parameters of the density of x To solve the problem, we shall employ the principle of maximum likelihood
Now the likelihoodll of the T observations is given by
L* (x l' X 2 , , x T ; 11, 1:)
= (21t)- mTI2 11:1- TI2 exp [ - ~ ,t (x'/ - 11) '1:- 1 (x./ - 11)] (1.6.1) Put
and note that (1.6.1) can be written in the more convenient form
L*(x ,1, X 2, , X T ; 11,1:) = (21t)- mTI2 11:1- TI2 exp [ -t tr (Y'1:- 1 Y)] (1.6.3)
where Y is a (m x T) matrix whose tth column is y / Since for any two matrices
A, B, such that AB and BA are both defined, tr (A B) = tr (BA), the exponential
in (1.6.3) can be written-apart from the factor, t-as tr (1:-1 YY')
Let u be the T x 1 vector of units defined by
Trang 39Adding and subtracting Txx' in the right-hand side of (1.6.6), we obtain
To obtain maximum likelihood estimators, we maximize (1.6.10) with respect
to II and :E Actually, in this case it is simpler (and equivalent) to maximize with respect to II and V
Notice, however, that II enters the likelihood function only through the exponential and, moreover, since
tr {V(x - ll)(X - 1-1)'} = tr {(x - I-1)'V(x - 1-1)} = (x - ll)'V(X - 1-1)12 (1.6.12)
we conclude, by the positive definiteness of V, that the likelihood function is maximized with respect to 1-1 if and only if (x - I-1)'V(x - 1-1) is minimized But the smallest value this quadratic form can assume is zero and this occurs only for
(1.6.13) Thus our earlier manipulations have spared us the need of differentiating (1.6.10) with respect to ll To complete the estimation problem, we need only differenti-
ate with respect to V the expression resulting in (1.6.10) when we substitute
therein the maximum likelihood estimator of II given in (1.6.13) Making the substitution, we obtain
L(X.1' X.2'.··' X'T;:E) = -2- ln (2n) + '2 In IVI-2tr VA (1.6.14) where
Trang 40where Vij is the cofactor of vij and I V I is the determinant of V In matrix form,
the equations (1.6.17) read
Lemma11: Let x"" N(Il,~) and let {x t : t = 1,2, , T} be a random sample on the vector x Then the maximum likelihood estimators of Il and ~
are given, respectively, by
ft=i
~ XX' - Tii'
~ =
-:T,, (1.6.20) (1.6.21 ) Before turning to the problem of determining the distribution of the maximum likelihood estimators just obtained, we cite, without proof, the following useful
Theorem 3: Let s be a (column) vector such that
s's = I
but is otherwise arbitrary
Then there exists an orthogonal matrix13 having s as its last column
The distribution of ft and ~ is established by
(1.6.22)
Lemma 12: Let x "" N(Il, ~) and consider the sample {x t t = I, 2, , T}
Let X be the matrix having x t as its tth column Let B be a TxT orthogonal matrix having for its Tth column14