The limited infor- mation analysis relies on prior information both exact and stochastic .about the parameters of a single structural equation; call them 8.. For a class of prior densiti
Trang 1Chapter 9
SYSTEMS
JACQUES H DREZE and JEAN-FRANCOIS RICHARD*
Universiti Catholique de Louvain
Contents
5.3 Posterior conditional densities and moments 550
*The authors thank David F Hendry, Teun Kloek, Hans Tompa, Herman van Dijk, and Arnold Zellner for helpful comments on a preliminary version They are particularly grateful to LUC Bauwens for his assistance with computations and his detailed comments on the manuscript
Handbook of Econometrics, Volume I, Edited by Z Griliches and M.D Intriligator
0 North-Holland Publishing Company, 1983
Trang 3519
1 Introduction and summary
1.1 The simultaneous equation model
The standard specification of the static simultaneous equation model (SEM) in econometrics is [see, for example, Goldberger (1964) or Theil(197 l)]
YB+Zr=lJ,
where
0.1)
Y is a T X m matrix of observed endogenous variables,
Z is a T X n matrix of observed exogenous’ variables,
B is an m X m non-singular matrix of unknown coefficients,
r is an n x m matrix of unlcnown coefficients, and
U is a T x m matrix of unobserved disturbances
U and Z are uncorrelated and the T rows of U are assumed identically indepen- dently normally distributed, each with zero expectation and positive definite symmetric (PDS) covariance matrix X; hence the matrix U has the mat&variate normal density (as defined in Appendix A):
The m equations (1.1) are called structural equations Solving these explicitly for
Y yields the reduced form:
where
The data density p( Y] II, 52) - see footnote 1 -is
‘Strictly speaking the variables in Z should be “weakly exogenous” in the terminology of Engle
et al (1983) The analysis applies to dynamic simultaneous equation models where Z includes lagged variables, whether exogenous or endogenous In that case, however, the derivation of a joint predictive density for several successive periods of time raises problems, some of which are discussed in Richard (1979) Finally, even though our analysis is conditional on Z, we shall systematically omit Z from the list of conditioning variables, for notational convenience
Trang 4520 J H D&e and J- F Richard
Given Y, (1.5) yields two alternative expressions for the likelihood function, namely
(1.6) with (II, s2) E R”” x C?“, where C?“’ denotes the space of all real PDS matrices of
size m; and
L(B,r,ZIY) a llBllTpl-(l/Z)T
with (B, F, 2) E CBm x R”” X c?“, where ?i??’ denotes the space of all real non-sin-
gular matrices of size m
In this model, the reduced-form parameters (II, 52) are identified, but the structural parameters (B, F, Z) are not The likelihood function L(B, I’, 21 Y) is constant over every m*-dimensional subspace defined by given values of II and D
in (1 S) (Every such subspace consists of “observationally equivalent” parameter points and defines an “equivalence class” in the parameter space.) Identification
of (B, r, 8) is achieved by imposing on the structural parameters a set of exact a
priori conditions (including the normalization rule):
equations and predetermines m or more coefficients in each row of (B’ rl)
Maximum likelihood estimation of the structural parameters calls for maximiz- ing the likelihood function in (1.7) subject to the constraints (1.8) This estimation
method, known as full information maximum likelihood (FIML), is conceptually
straightforward, but relies on numerical procedures and may be computationally demanding Also, it may be sensitive to specification errors in the overidentified case In particular, estimators of (II, Jz) are then subject to exact a priori
restrictions, some of which may be in conflict with the data
An alternative, which is less efficient but also less demanding computationally
and less sensitive to some classes of specification errors, is limited information
maximum likelihood (LIML) It consists in estimating each structural equation
Trang 5521
separately by maximizing the likelihood function subject only to those constraints
in (1.8) which involve solely the coefficients of the equation being estimated In the standard case where the constraints (1.8) are separable across equations, the LIML and FIML estimators of the coefficients of a structural equation coincide when the other equations in the system are not overidentified
Exact finite sample distributions of maximum likelihood estimators have been obtained only for LIML analysis of an equation containing two endogenous variables [see, for example, Mariano (1980)] LIML and FIML estimators can be numerically approximated respectively by two-stage (2SLS) and three-stage (3SLS) least squares estimators with no loss of asymptotic efficiency [see, for example, Theil(l971) and Hendry (1976)] In the special case of just-identified models, all these methods are equivalent
1.2 Bayesian inference and identification
A Bayesian analysis of the SEM proceeds along the same lines as any other Bayesian analysis Thus, if the analyst has chosen to work in a given parameter space, he defines a prior density’ on that space and applies Bayes theorem to revise this prior density in the light of available data The resulting posterior density is then used to solve problems of decision and inference Predictive densities for future observations can also be derived
Thus, let p( y 10) and p(8) denote respectively a data density with parameter 19 and a prior density The predictive density p(y) and the posterior density p (0 1 y) are then given by
(l-9)
where L( 8 Iy ) is the likelihood function 3 The product L(Oly)p(O) defines a
kernel of the posterior density [see, for example, Raiffa and Schlaifer (1961, section 2.1.2)]
The operation (1.10) is well-defined, whether 8 is identified or not; hence the remark by Lindley (1971, p 46) that “ unidentifiability causes no real difficulty in
‘It is assumed, mainly for convenience, that all the relevant probability distributions are continuous with respect to an appropriate measure, typically the Lebesgue measure; they are therefore repre- sented by density functions By abuse of language, we still use the word “density” rather than
“ measure” when the function is not integrable
31n (1.10) the proportionality factor may depend on y, since the posterior density is conditional
on y
Trang 6522 J H D&e and J- F Richard
the Bayesian approach”.4 In particular, we may analyze an underidentified model, with a prior density which substitutes stochastic prior information for exact a priori restrictions And we may substitute stochastic prior information for over- identifying a priori restrictions, whenever the underlying economic theory is less than fully compelling
Note, however, an obvious implication of (1.10) If 0, and 8, are two observa- tionally equivalent points in the parameter space, so that p ( y 18,) = p ( y 1 d,), then
P(w) _ PUU
Over equivalence classes of the parameter space, the prior density is not revised through observations If, in particular, 6’ can be partitioned into 6 = (p, A) in such
The conditional prior density p ( p 1 A) is not revised through observations; however, the marginal prior density p(p) will be revised unless p and X are a priori
independent See Section 3 for details
1.3 Bayesian treatment of exact restrictions
Incorporating in a Bayesian analysis exact prior
conditionalization paradoxes [see, for example,
restrictions such as (1.8) raises Kolmogorov (1950)] which we briefly illustrate by means of a simple bivariate example Let p(k),, 6,) be a bivariate uniform density on the open unit square and let D = {(Cl,, t3,) 18, = e,} be
a diagonal of that square There is no unique way to derive from p(B,, 0,) a density for 8, conditionally on (e,, (3,) E D Let, for example, A = 8, - 0, and
p = f+/0,; then
In (1.13) D is implicitly considered as the limit of the infinite sequence {Din}, where D,, = ((4, e,)i -(l/n) < 8, - e, < l/n>; in (1.14) it is considered instead
4The reader may usefully be reminded at once that: “identification is a property of the likelihood function, and is the same whether considered classically or from the Bayesian approach” [Kadane (1975, p 175)]
Trang 7as the limit of the infinite sequence {D&, where
~,,={(e,,e,>ll-(l/n)<e,/e,~l+l/n>
523
In order to avoid such paradoxes we shall write explicitly all the exact prior restrictions,5 and assign probabilities only for parameter vectors defined over a space of positive Lebesgue measure In the above example, this approach calls for selecting among (1.13) and (1.14) - or among other similar expressions- the density which seems best suited for inference on t?,, without attempting to derive
it from an underlying joint density If needed, inferences on 0, can always be drawn through the integrand transformation (3, = 0,
1.4 Bayesian analysis of the reduced form
We shall discuss in Section 7 numerical methods for evaluating key characteristics (integrating constant, moments, fractiles, etc.) of posterior and predictive density functions However, for models with many parameters, like most simultaneous equation models, analytical methods remain indispensable to evaluate these densities- either fully, or conditionally on a few parameters amenable to numeri- cal treatment, or approximately to construct importance functions for Monte Carlo integration The classes of prior densities permitting analytical evaluation
of the posterior density are limited In most Bayesian analyses they consist essentially of the so-called non-informative and natural-conjugate families Loosely speaking, a natural-conjugate prior density conveys the same information as a hypothetical previous sample, whereas a non-informative prior density conveys as little information as possible (Typically, a non-informative prior is a limiting member of the natural-conjugate family.)
In the simultaneous equation model,, the unrestricted reduced form is a tradi- tional multivariate regression model, which has been studied extensively [see, for example, Zelmer (1971, ch VIII)] The natural-conjugate density for that model has the Normal-Wishart form It follows that the mn X mn marginal covariance matrix of the column expansion of II is restricted to a Kronecker product, say W@M- ‘, where W E (2”’ and M E 6?” are matrices of prior parameters6 This restriction is harmless if we wish the prior density to be non-informative about II (A4 = 0), or to be informative about the parameters of a single reduced-form equation (W is zero except for a single diagonal element) It is a severe restriction
51t is only when such explicitations are not critical for the sake of the argument that we shall use notations such as p( B, I’, 8), even though (B, r, Z) is subject to the restrictions (1.8)
61n the natural-conjugate framework, this property reflects the fact that the covariance matrix of the column expansion of I?, the unrestricted ordinary least squares estimator of II, is given by SZQ(Z’Z)-‘
Trang 8524 J H D&e and J- F Richard
in other cases since it implies that all columns of II should have the same covariance matrix, up to a proportionality factor; see Section 4 for details As discussed in Section 6.4, there exist generalizations of the Normal-Wishart density which are more flexible in that respect and are also natural conjugate for
the seemingly unrelated regression model (SUR) or for reduced forms subject only
to exclusion restrictions However, the evaluation of these densities requires some application of numerical methods
1.5 Bayesian analysis of the structural form
A “natural-conjugate” approach to inference about the structural parameters is fraught with even more difficulties First, if the model is under-identified, a natural-conjugate prior density is necessarily improper Second, the restrictive covariance structure obtained for the reduced-form prior applies also to the
conditional prior density p (r, 2 1 B ) Third, a full natural-conjugate prior density
for the structural parameters does not have the Normal-Wishart form, due to the
presence of the additional factor I] BljT in the likelihood (1.7)
Two alternative approaches have been developed, corresponding respectively to limited information analysis and to full information analysis The limited infor- mation analysis relies on prior information (both exact and stochastic) about the parameters of a single structural equation; call them 8 A suitable non-informa- tive prior density is then defined on the reduced-form parameters (n, a), conditionally on 8 Bayes’ theorem yields a posterior density in the form
~(01 Y)p(lI,L?lfl, Y) For a class of prior densities on 8 (including the Normal-gamma and Student-gamma families), the posterior marginal density for the regression coefficients in the equation of interest is a poly-t density, i.e a density whose kernel is a product, or ratio of products, of Student-t kernels7 [see Dreze (1977)] Properties of poly-t densities are discussed in Section 7 In the cases reviewed here, evaluation of posterior and predictive densities requires at most unidimensional numerical integration and an efficient software package is available A Bayesian limited information analysis is thus fully operational It is presented in Section 5, together with an application In order to provide a simple preview of the main results with a minimum of technica.lities, we treat first a special case in Section 2 That section is self-contained, except for proofs
“‘Poly-r densities are defined by the simple property that their kernel is a product, or ratio of products, of Student-t kernels They are obtained as posterior marginal densities for regression coefficients, under a variety of specifications for the prior density and the data generating process No analytical expressions are available for the integrating constant or moments of poly-r densities, and the family is not suited for tabulations Yet, it may (by contemporary standards) be regarded as
‘tractable’, because it lends itself to numerical analysis by integration in a number of dimensions that does not depend upon the number of variables but rather upon the number of Student-r kernels in the product, or in the numerator (minus one)” [D&e (1977, p 330)]
Trang 9Ch 9: Bayesian Analysis of Simultaneous Equation Systems 525
The full information analysis is more complex One approach uses the extended
natural-conjugate prior density corresponding to the seemingly unrelated regres-
sions model That prior density includes as a special case the possibility of
specifying m independent Student densities on the coefficients of the m structural
equations The posterior density is obtained analytically, either for (B, T) margi- nally with respect to 2, or for (B, 2) marginally with respect to r Neither of
these expressions has yet been found amenable to further analytical treatment, except in the special case of two-equation models, reviewed in Section 6.5 The only general procedure is to integrate the posterior density numerically by
a Monte Carlo technique This approach gives additional flexibility in the specification of prior densities It has been developed by Kloek and van Dijck (1978); see Sections 6.6.2 for an application and 7.3.2 for computational aspects The posterior density for the coefficients of a single equation, defined margi- nally with respect to X but conditionally on all other parameters of the model, belongs to the class of poly-t densities whenever the corresponding conditional prior density belongs to that class (e.g is a multivariate Student density) When attention is focused on the coefficients of a single equation, this conditional full information approach and the limited information approach define two extreme ways of handling prior information on the other coefficients of the model Full information methods are discussed in Section 6, together with applications
A table at the end of the chapter contains references to the main formulae for prior and posterior densities on the coefficients of one equation Finally, the formulae pertaining to the Normal-Wishart density, which play a central role in these analyses, are collected in Appendix A
1.6 Summary
In summary, Bayesian limited information analysis of general models and Bayesian full information analysis of two-equation models have now been devel- oped to a point where:
(i) they allow a flexible specification of the prior density, including well- defined non-informative prior measures;
(ii) they yield exact finite sample posterior and predictive densities with known properties; and
(iii) they can be evaluated through numerical methods (either exact or involving integration in a few dimensions), using an efficient integrated software package
The treatment of stochastic prior information is illustrated in Sections 2.4, 5.5, and 6.6.1 The use of posterior densities for policy analysis is illustrated in Sections 6.6.2 and 6.6.3
Trang 10Remarks (i) and (ii) apply also to full information analysis of general models But the numerical evaluation (by Monte Carlo) is more demanding, and no integrated software package is available yet Remark (iii) also applies to the analysis of one structural equation at a time, conditionally on point estimates for the other parameters But avenues for further developments are open
These advances must be weighted against some lasting drawbacks, in particu- lar:
(i) The computations remain demanding-as could be expected of exact finite sample results in a complex model;
(ii) the specification of the prior densities requires careful thought to avoid misrepresentation of the prior information; and
(iii) the treatment of identities, non-linear restrictions, reparameterizations, etc
is more complicated than under a maximum likelihood approach, to the extent that integrand transformations are more complicated than functional transformations
1.7 Bibliographical note
The Bayesian approach to simultaneous equations analysis was reviewed earlier
by Rothenberg (1975) The intellectual origins of the subject go back to unpub- lished papers by Drbe ( 1962), Rothenberg ( 1963), and Zellner ( 1965) These three papers contain ideas developed more fully in ensuing work by the same authors - see the list of references These ideas also influenced other researchers at Louvain, Rotterdam (where Rothenberg had worked in 1962-1964), Madison, and Chicago Much of the more definitive work reviewed here was carried out at these places, notably by Morales, Richard and Tompa, Harkema, Kloek and van Dijk, and Chetty, Kmenta, Tiao and Vandaele; see the list of references
Rothenberg’s 1963 paper contained an application to Haavelmo’s model, studied in greater details by Chetty (1968) The next applications came from Harkema (1971), Morales (1971), and Richard (1973) See also Dreze (1976), Kaufman (1975), and Maddala (1976) for the analysis of Tintner’s model of supply and demand for meat; and Zellner, Kmenta and Dreze (1966) as well as Zellner and Richard (1973) for an application to Cobb-Douglas production functions
2 A special caw
2.1 Limited information maximum likelihood estimation
In order to give a simple preview of the more general analysis to follow, we shall
Trang 11Ch 9: Bayesian Analysis of Simultaneous Equation Systems 521
identified equation As usual, maximum likelihood analysis provides a natural reference point for the Bayesian approach Accordingly, we retrace some steps of LIML analysis and bring out the analogies
To single out one equation - labelled equation 1 -we partition the system (1.1)
where u is normally distributed with expectation 0 and covariance matrix a21T
Compatibility with (1.4) requires
Restricting attention to exclusion restrictions, we consider the partitioning
17
Under the rank condition
(PA, y*, a) is identified up to a scale factor LIML analysis recognizes this feature and estimates (flA, y*, a) up to an arbitrary scale factor, to be selected freely In particular, if the scale normalization consists in setting one element of /I equal to unity, the statistical analysis is invariant with respect to the choice of that particular element (This is in contrast with 2SLS analysis, whereby a choice of
Trang 12528 J H D&e and J- F Richard
In order to proceed in “limited information” spirit, i.e without explicit consideration for the restrictions pertaining to the remaining structural equations,
it is convenient to partition the reduced form (1.3) as
where Y, is T X 1, IT, is n X 1, and V, is T X 1 Here Y, is any column of Y, (the labelling inside Y, is arbitrary)
One may then exhibit the joint density of the disturbances [u V,], i.e of the T realizations of one structural disturbance (u) and m - 1 reduced-form dis-
turbances (IQ Since (U V,) = VL, where L is the triangular matrix
we have
with Q* = L’s)L It follows that the likelihood function (1.7) may be rewritten as
where WA, = YL [ I - Z( Z’Z)- ‘Z’] YA This expression is homogeneous of degree 0
in the parameters (& y *) It can therefore be maximized with respect to any
(m, + n, - 1) elements of (&, y *) conditionally on a normalization rule, in order
to obtain the LIML estimators Since y* is unrestricted, (2.11) reaches its
maximum at y * = (Z; Z*)- ‘Zi GflA and the resulting concentrated likelihood function is
where VA = Yd[l- Z,(Z:Z,)-‘Z’,]Y, (2.12) is homogeneous of degree 0 in &
Trang 13529
Let fi = (Z’Z)) ‘Z’Y be partitioned conformably with l7 in (2.5) We can then verify that rank fir,,, = m, - 1, the sampling analogue of (2.6), is equivalent to rank( WzA - W,,) = m, - 1 The properties of LIML estimators are discussed for example in Anderson and Rubin (1949) or Theil(l971)
2.2 A Bayesian analogue
To develop a Bayesian analogue of LIML, we may start from either (1.6) or (2.10) and combine the likelihood with a prior density which is non-informative with respect to all the parameters except (& y *, a’) The expression (2.10) provides a natural starting point to the extent that (&, y *, a2) appear explicitly as argu- ments of that likelihood function Also, (II,, 52*) are not subject to exact prior restrictions On the other hand, the parameterization of (2.10) includes an element
of arbitrariness, to the extent that the matrix L defined in (2.8) is itself arbitrary
In particular, the selection of m - 1 columns from I7 (from Y) destroys the symmmetry among all the endogenous variables included in the equation of interest Thus, if Q is any m(m - 1) matrix such that P = [/3 Q] is non-singular, then the parameterization ( !D2, A), where
is just as meaningful as the parameterization (n,, a*)
This element of arbitrariness is not a difficulty in LIML analysis Indeed, (2.11)
is invariant with respect to Q in the sense that reformulating (2.10) in terms of P2,A) and m aximking with respect to ($, A) still yields (2.11) independently
of Q.8 Similarly, in a Bayesian analysis resting on a proper prior density
p( &, y *, IT,, a*), the prior information so defined in (&, y *, IT,, a*)-space could be translated into the same prior information in (&, y *, G2, A)-space, by an integrand transformation with a well-defined Jacobian But this element of arbitrariness raises a difficulty in defining a non-informatiue prior density on ( 112, 9*) because such a density would generally fail to be equally non-informa- tive about (G2, A) Fortunately, it is possible to select a particular prior measure (improper density) which can be claimed “non-informative” in a sense that is both
natural and invariant with respect to Q It is shown in Section 5 that the density
(2.14) has the desired property This density can be interpreted as the product of three
‘The LIML estimators of (II, Q) are also invariant with respect to Q Indeed, under general conditions, the ML estimator of a function of the parameters is that function of the ML estimators of the parameters
Trang 14J H Dr.&e and J- F Richard
terms:
(i) P(& Y *),
(ii) p(%lJZ*)
a prior density which is left unspecified at this stage;
a IJ%, I -(‘/2)n, the limiting form of an (m - 1)~ n multivariate normal density for II, with covariance matrix L?jz,,8M-‘, where M = 0; and
(iii) p(L?*)a)52*)-(I)( ’ 2 m+‘), the limiting form of a Wishart density with v degrees of freedom, where v = 0
Combining (2.10) and (2.14) yields (see footnote 1)
this expression can be integrated analytically with
@iPA+ z*Y*mPA+ Z*Y*) 1
(2.16) The posterior density (2.16) is invariant with respect to Q in the sense that if the analysis is conducted in terms of (G2, A) instead of (II,, ti*) the same
posterior density is obtained for (&, y *): both the prior density (2.14) and the likelihood function are invariant with respect to Q
Expression (2.16) differs from (2.11) in three respects: (i) it is a density and not
a likelihood function; (ii) it contains as an additional term the prior density
p (&, y *); and (iii) the ratio of quadratic forms in (&, y *) is raised to the power (T- m + 1)/2 instead of T/2 The first two differences are self-explanatory The last one Feflects the difference in exponents (degrees of freedom) between conditionalization - as in (2.11) - and marginalization - as in (2.16) - of a Wishart density?
It is also shown in Section 5 that a prior density of the form
(2.17)
‘Indeed, the stepwise maximization leading to (2.11) is algebraically equivalent to conditionalization
of a joint density
Trang 15m, + n , variables with covariance matrix u *N- ‘, where N = 0
We now turn to a discussion of the normalization issue- Section 2.3.1 -and show that (2.16) with p(& y*) suitably chosen, (2.18), and (2.19) define poly-t densities - Section 2.3.2
2.3 The normalization issue
2.3.1 Invariance
We noted above that the concentrated likelihood functions (2.1 l)-(2.12) are homogeneous of degree 0 in (PA, y *) Thus, if & is any element of &, such that
p, * 0 with probability one, define:
I and also, for ease of notation:
It is obvious that, in the parameterization (PI, (Y, II,, L?*), P, is not identified In particular, (2.11) may be rewritten as
Z’; W*,cY,
I
OAT w,dw)a (3& + Z*(Y*)‘(~E, + z*fx*)
Trang 16532 J H D&e and J- F Richard
Taking into account the Jacobian Ip,I”l’” 1 of the integrand transformation from (&ye) to (p,, cw), the posterior marginal densities (2.16), (2.18), and (2.19) become, respectively:
(1/2)(T-m+l)(~~~~~1)-(l/2)(T ++ml+1)~
(2.26)
It is shown in Section 5.6 that the (functional forms of the) densities (2.25) and (2.26) are invariant with respect to the normalization rule, i.e with respect to the labelling of variables under & The density (2.24) is invariant with respect to the normalization rule if and only if the prior density p(a) has that property
In conclusion, our approach to normalization consists in writing a prior density p(&, y,) a) as p(& Ja, )p(clll.) and then working with the parameters a alone However, for convenience we shall often write our formulae in terms of full parameter sets such as B, /I, or & and refer to “p normalized by & = 1” as a substitute for (Y, The more explicit notation would be heavier, especially when we shall discuss full information approaches
2.3.2 Poly - t densities
Using (2.21), it is easily recognized that (2.25) and (2.26) define l-l poly-t densities This is done in Section 5.4-see in particular formulae (5.37) and (5.38)-where it is shown that Ei;W& = sf +(a, - a,)‘H,(q - a,) The statistics
WA, and (SF, a,, H,) are in one-to-one correspondence Thus, ti;W,,E, is propor- tional to the quadratic form appearing in the kernel of a Student density for IY, centered at a, with the covariance matrix proportional to s;H; ‘ A similar argument applies to the other factors in the right-hand side of (2.25) and (2.26) which, therefore, define l-l poly-t densities respectively in (m, + n, - 1) and
(m, - 1) variables- the elements of (&, y *) and & after normalization These densities are well defined and integrable, they possess finite fractiles but no moments of order 1 or higher.”
When p(a) is a Student density, (2.24) defines a 2-l poly-t density on
(m, + n, - 1) normalized elements of (&, y *) Invariance with respect to the normalization requires p(a) to be a Cauchy density (Student density with one
‘OLIML estimators do not possess finite sampling moments
Trang 17- 0.6484 0.6860E -03 0.3361E +02 (0.1518) (0.0853) (0.0655)
- 0.6005 0.6655E - 03 0.3150E+02 (0.1116) (0.0657) (0.0482)
‘In order to ensure the existence of moments, the marginal posterior densities
of /322, y23, and y25 have been truncated over the intervals (- 1.5,0.0), (0.30E -
03, I IO E - 03), and (0.0,65.0), respectively The probability of falling outside
these intervals is in all cases less than 0.01
degree of freedom) In that case, (2.24) is integrable, but possesses no moments of order 1 or higher
2.4 An application
As an illustration, we consider the two-equation supply-demand model of the retail beef market in Belgium discussed in Morales (1971) and Richard (1973) The model is just identified and the structural coefficients are labelled as follows:
(2.27)
The corresponding variables in (Y 2) are successively: quantity (kg) consumed per capita, a meat price index, national income (Belgian francs) per capita, the cattle stock (number of heads at the beginning of each period) per capita, and a constant term Price and income have been deflated by an index of consumer prices Sixteen annual observations (1950-1965) are used We only consider here the second (demand) equation The LIML” estimators of its coefficients are reported in the first row of Table 2.1, together with their asymptotic sampling standard deviations
We first analyze the demand equation under the non-informative prior density (2.17) The corresponding posterior density of &, is given by (2.26) It is integrable and its graph is given in Figure 2.1 (curve 1) However, the difference
of exponents between the denominator and the numerator being equal to one, the
Trang 18J H Dr&eandJ-F Richard
Figure 2.1 Posterior densities of &
posterior moments do not exist The posterior means and standard deviations reported in Table 2.1 are truncated moments
Prior information on the income coefficient yZ3 is available from a subsample of
a budget study undertaken by the Belgian National Institute of Statistics and is summarized in the following mean and variance:
E(y2s) = 0.415 x 10-3, I&) = 0.1136 x lo-‘
This leads to the following Student prior density: l2
(2.28)
together with (2.14) The corresponding posterior means and standard deviations
as derived from (2.24) are reported in the third row of Table 2.1 The graph of the posterior density of p,, is given in Figure 2.1 (curve 2) The standard errors in
‘*The choice of the degree of freedom parameter will be discussed in Section 5 where additional
Trang 19Ch 9: Bayesian Analysis of Simultaneous Equation Systems 535
rows 2 and 3 reveal that the asymptotic LIML variances underestimate by 30-60 percent the exact Bayesian results
However, since we are non-informative on all coefficients but one, we might wish to follow the principle underlying the specification of (2.17) replacing, therefore, the prior density (2.14) by
which leads to the posterior density
p(alY) ap(y23)(~;W~dCY,)(1'2)(T-m+')
X{(Y,E, + Z*a,)‘(Y,@, + Z*(Y2)}-(“2)(T-m+m’+n~) (2.31) The corresponding posterior means and standard deviations are reported in the fourth row of Table 2.1 The graph of the posterior density of &, is given in Figure 2.1 (curve 3)
Note that the larger the exponent of the quadratic form (Y,Z, + Z,a,)‘(Y&& + Z,a,) in the posterior ~(a 1 Y), the more weight is given the OLS values, say 8, which are respectively - 0.3864, 0.5405E - 03, and 0.2304E + 02 The posterior variances also go down as the exponent increases; those in row 4 seem artificially low, in comparison with rows 2 and 3
3 Identification
3 I Classical concepts
The parameters of the model (l.l)-( 1~2) are (B, r, 2) in am X R”” X em Let (1.8) denote the set of exact a priori restrictions imposed on these parameters, including the normalization rule The parameter space is then
$={(B,~,~)E~~xR"~x~~~~~(B,~,~)=O, k=l K} (3.1)
Consider the transformation from (B, r, Xc) to (B, II, 52) as given by (1.4) The restrictions (3.1) may be expressed equivalently as
‘&( B, - IIB, B’SZB) =defXk( B, II, 52) = 0, k=l K (3.2)
so that the image space of S by the transformation (1.4) is
5 = {(B, II, a) E ‘%Y x R”” x l?l-“,( B, II, 52) = 0, k=l K} (3.3)
Trang 20536 .I H Dr&eandJ-F Richard
Figure 3.1
The transformation (1.4) is a mapping of S onto 5 Conditionally on B, it is also linear and one-to-one on the appropriate subspaces The projection of 9 on the space of (II, L?) is
The section of 5 at given (II, D) is
(3.4)
(3.5)
These concepts are illustrated in Figure 3.1
The model is identified at (II, L?) in 9 if and only if a3,, o is a singleton; it is identified if and only if $8, o is a singleton for almost all (II, L!) in 9; otherwise,
it is underidentified, and the set arro defines, for each (IT, 52), an equivalence class of the parameter space The model is oueridentified if and only if 9 is a proper subset of R”” x CL?“ (Thus, Figure 3.1 corresponds to a model which is both overidentified and underidentified.)
3.2 Posterior densities and identification
In a formal Bayesian analysis (see footnote 5) one defines a prior density on S, or
on 5, or on any parameter space which is related to S and 5 by one-to-one mappings Because B is non-singular the corresponding integrand transformations have non-vanishing Jacobians In particular, the prior density p( B, II, 52) can be
factorized as
Trang 21Ch 9: Bayesian Analysis of Simultaneous Equation Systems 531
where the support of p( II, a) is 55 and the support of p (B 1 II, 0) is ‘Srr o If the model is identified at (II, L?), then p( BIII, a) has all its mass concentrated at a single point Otherwise, p( B III, J2) is a density on ‘Srr c
The posterior density is obtained by application of i)ayes theorem:
unless p ( B I II, 42) = p(B) the marginal density p(B) is revised, because
(3.11)
Note that (3.10) and (3.11) remain valid under any parameterization, say 8 = (fl,, fit,), where 8, and 8, are related through one-to-one mappings respectively to (n, 0) and B
3.3 Prior densities and identification
The Bayesian analysis of the observations can be conducted in the reduced-form parameter space 9 only To the extent that prior information based on economic 13This proposition appeared in D&e (1962, 1975); it is discussed in Zellner (1971, p 257) and generalized in Kadane (1975, theorem 5)
Trang 22J H D&e and J- F Richard
theories is subject to revision through observations, it might sometimes seem desirable to avoid overidentification, and to embody such prior information in a marginal prior density p (II, a) However, when the prior information is provided
as p (B, I’, E), it may not be convenient to perform the integrand transformation from (B, r, 1) to (B, II, a) conditionally on (1 S); and/or to separate p (B, II, L?) into the marginal density p(lT, 52) and the conditional density p( B 1 II, f2) As we shall see below, these transformations are not necessary to obtain the posterior
density p( B, T, 2 ] Y, Z).14 But it is advisable to check, before engaging in detailed
computations, that enough prior information has been introduced, so that the joint posterior density will at least be proper When the model is identified by the exact restrictions (1.8), then the posterior density will be proper under almost any prior density of interest Whether the model is so identified can be verified by means of operational conditions extensively discussed in the econometric litera- ture [see, for example, Goldberger (1964)]
When the model is not identified by the exact restrictions (1.8), then the posterior density will not be proper, unless the prior density entails a proper conditional prior density on the equivalence classes of the parameter space-for
instance in the form p( B II7, a) Whether this property holds can be verified
through natural generalizations of the operational conditions for identifiability just mentioned
Let, for example, the conditions (1.8) assign predetermined values to a subset of
(B, r) and let the prior density consist of a product of independent proper densities, each defined on some coefficients from a given structural equation, times a non-informative density on the remaining parameters Then, in order for
p( BIII, $2) to be a proper density, it is necessary that at least m coefficients in each structural equation be either predetermined or arguments of a proper prior density (generalized order condition); it is sufficient that the corresponding submatrix from IT have full rank [generalized rank condition (2.6)]
More general cases are discussed in Dreze (1975, Section 2.3)
3.4 Choice of models and identification
Consider two simultaneous equation models, say M, with parameter (B,, r,, X,)
and M2 with parameter ( B2, T,, Z2) The variables Y and Z are taken to be the same for both models which, therefore, differ only by the set of exact a priori
restrictions, say
‘“See, however, Harkema (1971) for an example of explicit derivation of the complete marginal prior density p (II, s2) in a class of underidentified models
Trang 23539
with associated parameter spaces S’, 5 i, 9’ and %rr o (i = 1,2) When 9’ = Gj”,
we could say that the models M, and M, are not identified in relation to one another A Bayesian generalization of this concept, introduced in Zellner (1971, p 254), takes into account the prior densities p( II, 52 1 i&) associated with models Mi
(i = 1,2) The predictive densities, conditional on Mi, are
and M2 and their associated prior densities are not identified in relation to one another15 if and only if
In such a case, the prior odds for M, and M, are not revised through observa-
tions, and the posterior odds satisfy
P(WIY) P(W)
4 Reduced-form analytics
4.1 Natural-conjugate prior densities
The analytics in this section are not presented because we regard natural-con-
jugate analysis of the reduced form as useful for applications They are presented for easy reference in subsequent sections and as a review of basic tools in Bayesian econometrics
Provided rank Z = n -C T (and n + m < T to validate other results below), the
likelihood function (1.6) can be expressed in terms of the least-squares estimates:
WL Qiy) a &I- (‘/“Sxp[-+tr&?-‘[(II-~~)‘M(IT-I?)+W]] (4.2)
“See also Florens et al (1974) where the authors discuss a closely related concept of D-identifica-
Trang 24The right-hand side of (4.2) is also a kernel of a Normal-Inverted-Wishart density on (II, s2) The likelihood function may therefore be rewritten as
Thus a natural-conjugate prior density for (II, Q) is given by the product of a conditional matricvariate normal prior density:
and a marginal Inverted-Wishart prior density:
‘The prior expectations of II and Q are
by (4.4) and (4.5) is a matricvariate-t density:
(4-g)
As usual in a natural-conjugate framework, both the prior and the posterior densities have the same functional form, so that
(4.9) (4.10) (4.11)
Trang 25Ch 9: Bayesian Analysis of Simultaneous Equation Systems 541
(i) M,, = 0: The prior density is non-informative about II (and may or may not
be informative about a)
(ii) w; = 0 for all i, j except when i = j = 1: The prior density is informative
about the parameters of a single reduced-form equation (here taken to be the first, through appropriate labelling of variables)
In Section 6.4 we shall define an extended natural-conjugate prior density for
(n, s2), where the (conditional) covariance matrix of vecI7 is no longer con- strained to the form of a Kronecker product The cost of this generalization is the need to rely, partially at least, on numerical methods - a typical trade-off in the Bayesian framework
4.2 Further results
We have already mentioned that, under a natural-conjugate prior density, II and
D are not independent The practical consequences of this undesirable feature will
16As shown for example in Raiffa and Schlaifer (1961), these formulae derive from the standard convolution rules for the sufficient statistics of successive independent samples Let (Y, Z,,) be a matrix of observations for a (hypothetical) sample and (Y, Z,) be the pooled samples Then
M,=Z;Z,=ZhZ,+Z’Z=M,,+M, M,II, = Z;Y, = ZAY, + Z’Y= MJI, + MII,
W’,+II;M,II,=Y;Y,=Y~Y,+Y’Y=(W,+II~M,,&)+(W+~MI^I)
Trang 26J H D&e and J- F Richard
be illustrated in Section 6.6 We could assume instead prior independence between II and 52 and, in particular, consider the prior density
(4.13) When Q, = W, and p = X0 = vO, the first and second moments of (II, 52) under (4.13) are the same as those obtained under the natural-conjugate prior densities (4.4) and (4.5) A kernel of the posterior density is given by the product of the prior density (4.13) and the likelihood function (4.3) Marginalization with respect
to D yields the posterior density
As such this density is not amenable to further analytical treatment However, if
we partition the reduced form as in (2.7), the posterior density of II,, the vector
of coefficients of the first reduced-form equation, conditionally on II,, is a 2-O poly-t density,17 i.e a product of two kernels of Student densities As discussed in Section 7.2, such densities can be evaluated numerically by means of one-dimen- sional numerical integrations, independently of n, the number of coefficients in II, Furthermore, as we shall outline in Section 7.3, this property can be exploited
in order to construct so-called importance functions for the Monte Carlo numeri- cal integration of the joint posterior density (4.14)
If we are interested only in II,, an obvious alternative to the above conditional analysis amounts to using a prior density which is non-informative on (n,, tin>, following thereby the limited information techniques outlined in Section 2 Following (2.14), let
(4.15) where, for the purpose of comparison with (4.8), p(II,) is taken to be the Student density:
p(l7,)af; II,III~,LM,,~o m+l
(4.16)
The posterior density of II, is then obtained by a direct transposition of formula
“To see this, factorize the two kernels in the right-hand side of (4.14) according to formulae (A.33) and (A.34) in Appendix A
Trang 27543
(2.16): ‘*
(4.17) and is, therefore, again a 2-O poly-t density
It will be shown in Section 6.4 that 2-O poly-t densities are also obtained under extended natural-conjugate prior densities
As noted by Zellner (1971, p 101) the density (4.17) can be approximated by the normal density f;(IIt 1 n,, a), with parameters
Two forms of “non-informative” prior densities have been used for (n, s2):
(0 If we start from the natural-conjugate prior density (4.4) and (4.5) and let
Me, W,, and v,, tend to zero, we obtain:
which is found for example in Drkze (1976) Note that (4.20) includes a factor 1 s2 1 -(‘/2)n arising from the integrating constant of (4.4)
If we\start instead from the independent prior density (4.13) and let Me, W,,
pO, and X, tend to zero, we obtain:
Trang 28J H D&e and J- F Richard
We are not aware of any compelling argument to prefer either specification over the other In the posterior densities (4.10)-(4.1 l), the choice affects only v*, which is equal to T + n under (4.20) and T under (4.21) A conservative attitude would favor (4.21) which results in larger posterior variances As shown in Section
5, however, the prior densities used under limited information analysis are closer
A Bayesian analogue of this approach similarly uses the exact a priori restric- tions pertaining only to the parameters of a single structural equation, and ignores the remaining restrictions In addition, it rests on a fully specified prior density for the parameters of the structural equation of interest, together with a non-informative prior density on the remaining parameters of the system There is some arbitrariness in the choice of a parameterization for the rest of the system It seems accordingly desirable to specify the prior density in such a way that it be invariant with respect to this choice This problem is discussed in Section 5.2 The reader who is not interested in the technicalities associated with the definition of non-informative prior densities may turn immediately to Sections 5.3 and 5.4 which are essentially self-contained The illustration of Section 2.4 is extended in Section 5.5 Section 5.6 deals with normalization and Section 5.7 with generaliza- tions
5.2 Parameterization and invariance
Using the notation of Section 2, we study a single structural equation
“See, for example, Goldberger (1964, p 346), Maddala (1977, p 231), Malinvaud (1978, p 759), Theil(1971, p 503), or Zelher (1971, pp 264-265)
Trang 29Ch 9: Bayesian Analysis of Simultaneous Equation Systems 545
where u is normally distributed with expectation zero and covariance ~‘1, Conditionally on (j3, y, u 2, the reduced-form parameters (l7, Jz) satisfy the n + 1 linear equalities:
Define, for notational convenience, 0 = (8, y, a*)- see footnote 5 -and let 0 denote the set of admissible values for 8 Conditionally on 0, the space of reduced-form parameters is
In a limited information framework we want to define a “non-informative” prior measure on $$ If we ignore the restrictions (2.3), then (II, a) will vary freely over R”“’ X l.?“, and we could use the non-informative prior measures (4.20)
or (4.21)
Given (2.3), however, (II, 52) are not variation free In order to overcome this difficulty (see also the remark at the end of this section), we shall define a family
of one-to-one mappings from ‘?$ = $i’/ X 9” onto R*(“-I) X( R”-’ X CL?“‘- ‘)
These mappings are indexed by an (essentially) arbitrary m(m - 1) matrix Q
Each mapping (each Q) defines a variation-free reparameterization of the reduced form The corresponding expressions for the likelihood function are given in Lemma 5.2 We then define in Lemma 5.3 a condition under which non-informa- tive prior measures on the new parameter space are invariant with respect to Q This condition defines uniquely our non-informative prior measure on $$ Finally, two corollaries relate this result to the special case of Section 2 and to prior measures appearing in earlier work
We first note that qtifl is an n(m - l)-dimensional linear subspace of R”“‘ Let
Q be an arbitrary m (m - 1) matrix of rank m - 1, such that
Trang 30546 J H D&e and J- F Richard
‘3”’ is not a linear subspace of C?“‘ However, it is isomorphic to R”-’ X C?“- ‘,
as evidenced by the following property:
where A ij is mi x mj, and define
A,, = A,‘A,,, A,,., = A,, - -4210b
Then A E C?“’ if and only if
Trang 31The mappings (5.3), (5.10), and (5.11) define the parameterization (8, @*, A,,, A,,,), with parameter space 0 x R”(“-‘) x R”-’ x L?“‘-‘ These parameters are variation free Conditionally on 8, ( az2, A 12, A 22.,) are in one-to-one correspon-
dence with (II, s2) E 9* Note, however, that the definition of these parameters depends on the choice of Q
We can now rewrite the likelihood function in terms of the new parameters
= uP2(Yp + Zy)‘(Yp + Zy)+trA;2!1
~[YQ-Z~~-(YP+ZY)A,,]‘[YQ-Z~,-(YP+ZY)A,~~
Also, Is21 = llPll-21Al = a21P( -21A22,,l 0
Our prior density will be similarly expressed as
P(e,~,,A,,,A,,.,)=p(e).p(~2,A,,,A,,.,le) (5.16)
We want P(@~,A,~, A 22,, 18) to be both non-informative and invariant with respect to Q In the Normal-Wishart and Student-Wishart families, a non-infor- mative measure will be of the form
p(@22,A,2,A2,.,le)a (A22.,l-(1~2)(P+m+n+‘), VER (5.17) The following lemma yields the desired invariance
Trang 32In Section 2.1, Q was specified by (2.8) as*’
in which case !D2 = II, and A = s2*, the covariance matrix of (u V,) A,, and
A **., are then given by
Corollary 5.4
The prior measures
are invariant with respect to Q Under (2.14), the “marginal” prior measure p(B)
“A similar choice appears in Zellner (197 1, section 9.5)
Trang 33We make the integrand transformation from L?* ( = A) to (u*, A ,*, A **,,), where
“2 and A,,, are given in (5.19) and (5.20) The Jacobian of this transformation is ( u*)~-‘ After transformation, (2.14) and (2.17) factorize into the product of (5.17) with (5.21) and (5.22), respectively The proof follows from the invariance
of (5.17) 0
Remark
Earlier work on limited information Bayesian procedures often relied on prior measures expressed in terms of (13, II, D) This raises problems of interpretation, due to the restrictions (2.3) The following corollary helps to relate such measures
to the invariant measures (5.17)
Trang 34J H D&e and J- F Richard
5.3 Posterior conditional densities and moments
We now derive posterior densities under the invariant prior measures (5.17) Let
01 y) a dmw~i ('/*)(T-m+')fNT(Yp + ZylO, &,), (5.27)
p(A,,l~,,,,,~,Y)=~~“-‘)(~,,l(b’WP)-’P’WQ~(P’~)-‘~22.,)~
(5.28b) P(@*l42J**m 8,Y)=f~(m-‘)(qi21~Q-(~~+y)A,2,A22.,~~-’)
we use Lemma 5.6 at once to validate results quoted in
Under the prior measure (2.14), the posterior density of (&, Y*) is given by
(2.16)
*‘The density (5.28b) could indifferently be written as f{-‘(A’,21S,,s,j’, s,;‘A~~,,) The matrix
Trang 35Ch 9: Bayesian Analysis of Simultaneous Equation Systems
Under (2.14), we combine formulae (5.21) and (5.27), obtaining:
from which (2.16) follows Under (2.18) we simply replace (5.21) by (5.22) q The posterior density (5.27) is discussed extensively in Section 5.4 As for (5.28), it enables us to derive the posterior moments for (IT, 52) and the predictive moments for Y, conditionally on 8
Corollary 5.8
Under the prior measure (5.26), the posterior mean of IT, the posterior covariance matrix of the column expansion of II, and the posterior mean of 9, are given conditionally on 8 by
Trang 36552 J H Dr&e and J- F Richard
As a result of (2.3), the conditional moments of (II, Q) satisfy:
5.4 Posterior marginal densities
We now concentrate on the analysis of the posterior density of 8, as given in (5.27) This density is of direct interest for inference on 13 and it is also required to marginalize the expression derived in Corollary 5.8 The form of p (01 Y) indicates that, conditionally on p, we can apply the usual analysis of univariate regression
models, as discussed for example in Zellner (1971), whereby p( t3) is either a
Normal-gamma density - as in (5.40) below - or a Student-gamma density- as in (5.49) below More general cases must be treated by means of numerical integra- tion, using the techniques described in Section 7, and will not be discussed Also,
we only consider exclusion restrictions, since more general linear restrictions can
be handled through suitable linear transformations of the coefficients and the variables
The notations are those of Section 2 and we have p’ = (& 0’) and y’ = (y; 0’), where & E R”l and y* E R “I The data matrices are partitioned confor-
mably as in (2.4) As mentioned in Section 3.3, we need not impose the condition
m, + n, < n provided p( 0) is sufficiently informative Normalization is dealt with
by introducing the transformation from (&, y*, a2) to (/I,, (Y, u:), where
al= (a; a;) = $(a2 &, Y, yn,) ER”,
I
(2.20) with I, = (m, - l)+ n, and
(5.36)
Trang 37Under the prior density
P(wJ:)=.f~(4a,, ~?H,-‘)hy(ũl~o”~ ILo),
the posterior density of (ar, u:) is given by
(5.40)
XrẵIa*,~:H;~)~i,(':l'S~tL*), where
Trang 38554 .I H Dr&e and J- F Richard
The posterior density of (Y,, being defined through a ratio of Student kernels, is
a so-called l-l poly-t density Furthermore, following Corollaries 5.8 and 5.10, the conditional posterior expectations of (n, 62) and covariance matrix of vecII depend on quadratic forms in (Y, which are precisely those characterizing the Student kernels in ~(a, I Y) As discussed in Section 7, this makes the Bayesian limited information analysis of a single equation fully operational under the prior density (5.40)
Specifying a sensible prior density for u: is generally a difficult task Conse- quently, model-builders often use diffuse specifications In such cases, it is advisable to assume prior independence between (Y and a:, as we shall illustrate in
Trang 39Ch 9: Bayesion Analysis of Simultaneous Equation System 555
Section 5.5 The prior density
satisfies this independence requirement Note that when G, = so2H0 and A, = pa, the prior densities (5.40) and (5.49) imply the same first and second moments for (cw, CJ~); in particular, cx and CT: are still lineurly independent under (5.40) Lemma 5.11
Under the prior density (5.49), the posterior densitiesp(af(a, Y) and p(a) Y) are given by
See, for example, Zellner (1971, ch 4) q
The posterior density p(al Y) is now a so-called 2-l poly-t density As discussed in Section 7, this preserves the tractability of the approach
5.5 An application
To illustrate, we review the application described in Section 2.4 under both the Normal-gamma prior density (5.40) and the Student-gamma density (5.49) In all cases under consideration, the (hyper) parameters in (5.40) and (5.49) are chosen in such a way that the prior density p( cv) is given by (2.29).23 This leaves
23That is, A, = p = 6, and
ao=0.415E-03 Y ,
0 0
Go = s~-~&, = 0.22007E +08
Trang 40556 J H D&e and J- F Richard
Table 5.1 Limited information analysis of the demand eauation
(0.1292) (0.0763) (0.0558)
B2 - 0.8047 0.7552 E -03 0.4090 E + 02
(0.2575) (0.1405) (0.1143) B3 - 0.8538 0.7706E - 03 0.4343 E + 02
(0.9325) (0.4003) (0.4490)
only ~0’ to be specified Three different values are considered, namely 4, 16, and
64 The corresponding prior means and standard deviations of a,, are 1, 4, and
16 For comparison, the FIML estimator of uzz is 0.832, a value which should be unknown when ~(a:) is selected Hopefully, the second and third values of so” should appear most implausible to a careful analyst They have been chosen in order to exemplify the dangers of a careless assessment of p( CT:), especially under the Normal-gamma specification (5.40) The runs corresponding to the successive values of ~0’ are labelled A, ) A, under (5.40) and B, + B3 under (5.49) The posterior means and standard deviations of (Y are reported in Table 5.1 The graphs of the posterior densities of & are given in Figures 5.1 and 5.2
30
A2