NON-LINEAR REGRESSION MODELS

I will not discuss the maximum likelihood estimation of non-linear models unless the model is written in the regression form 1.1.. The reader is advised to recognize a close connection b

Trang 1

NON-LINEAR REGRESSION MODELS

Trang 2

5.1 Non-linear two-stage least squares estimator 362

5.4 Non-linear three-stage least squares estimator 376

5.5 Non-linear full information maximum likelihood estimator 379

*This work was supported by National Science Foundation Grant SE%7912965 at the Institute for Mathematical Studies in the Social Sciences, Stanford University The author is indebted to the following people for valuable comments: R C Fair, A R Gallant, Z Griliches, M D Intriligator,

T E MaCurdy, J L Powell, R E Quandt, N E Savin, and H White

Handbook of Econometrics, Volume I, Edited by Z Griliches and M.D Intriligator

0 North-Holland Publishing Company, 1983

Trang 3

334 T Amemiya

This is a survey of non-linear regression models, with an emphasis on the theory

of estimation and hypothesis testing rather than computation and applications, although there will be some discussion of the last two topics For a general discussion of computation the reader is referred to Chapter 12 of this Handbook

by Quandt My aim is to present the gist of major results; therefore, I will sometimes omit proofs and less significant assumptions For those, the reader must consult the original sources

The advent of advanced computer technology has made it possible for the econometrician to estimate an increasing number of non-linear regression models

in recent years Non-linearity arises in many diverse ways in econometric applications Perhaps the simplest and best known case of non-linearity in econometrics

is that which arises as the observed variables in a linear regression model are transformed to take account of the first-order autoregression of the error terms Another well-known case is the distributed-lag model in which the coefficients on the lagged exogenous variables are specified to decrease with lags in a certain non-linear fashion, such as geometrically declining coefficients In both of these cases, non-linearity appears only in parameters but not in variables

More general non-linear models are used in the estimation of production functions and demand functions Even a simple Cobb-Douglas production function cannot be transformed into linearity if the error term is added rather than multiplied [see Bodkin and Klein (1967)] CES [Arrow, Chenery, Minhas and Solow (196 l)] and VES [Revankar (1971)] production functions are more highly non-linear In the estimation of expenditure functions, a number of highly non-linear functions have been proposed (some of these are used in the supply side as well)-Translog [Christensen, Jorgenson and Lau (1975)], Generalized Leontief [Diewert (1974)], S-Branch [Brown and Heien (1972)], and Quadratic [Howe, Pollack and Wales (1979)], to name a few Some of these and other papers with applications will be mentioned in various relevant parts of this chapter The non-linear regression models I will consider in this chapter can be written

in their most general form as

(1.1) where y,, x,, and (Y~ are vectors of endogenous variables, exogenous variables, and parameters, respectively, and uif are unobservable error terms with zero mean Eqs (1 l), with all generality, constitute the non-linear simultaneous equations model, which is analyzed in Section 5 I devote most of the discussion in the chapter to this section because this area has been only recently developed and therefore there is little account of it in general references

Trang 4

Ch 6: Non -linear Regression Models 335

Many simpler models arising as special cases of (1.1) are considered in other sections In Section 2 I take up the simplest of these, which I will call the standard non-linear regression model, defined by

where (u,} are scalar i.i.d (independent and identically distributed) random variables with zero mean and constant variance Since this is the model which has been most extensively analyzed in the literature, I will also devote a lot of space to the analysis of this model Section 3 considers the non-i.i.d case of the above model, and Section 4 treats its multivariate generalization

Now, I should mention what will not be discussed I will not discuss the maximum likelihood estimation of non-linear models unless the model is written

in the regression form (1.1) Many non-linear models are discussed elsewhere in this Handbook; see, for example, the chapters by Dhrymes, McFadden, and Maddala The reader is advised to recognize a close connection between the non-linear least squares estimator analyzed in this chapter and the maximum likelihood estimator studied in the other chapters; essentially the same techniques are used to derive the asymptotic properties of the two estimators and analogous computer algorithms can be used to compute both

I will not discuss splines and other methods of function approximation, since space is limited and these techniques have not been as frequently used in econometrics as they have in engineering applications A good introduction to the econometric applications of spline functions can be found in Poirier (1976) Above I mentioned the linear model with the transformation to reduce the autocorrelation of the error terms and the distributed-lag model I will not specifically study these models because they are very large topics by themselves and are best dealt with separately (See the chapter by Hendry, Pagan, and Sargan

in this Handbook) There are a few other important topics which, although non-linearity is involved, woud best be studied within another context, e.g non-linear error-in-variable models and non-linear time-series models Regarding these two topics, I recommend Wolter and Fuller (1978) and Priestley (1978) Finally, I conclude this introduction by citing general references on non-linear regression models Malinvaud (1970b) devotes one long chapter to non-linear regression models in which he discusses the asymptotic properties of the non- linear least squares estimator in a multivariate model There are three references which are especially good in the discussion of computation algorithms, confidence regions, and worked out examples: Draper and Smith (1966) Bard (1974) and Judge, Griffiths, Hill and Lee (1980) Several chapters in Goldfeld and Quandt (1972) are devoted to the discussion of non-linear regression models Their Chapter 1 presents an excellent review of optimization techniques which can be used in the computation of both the non-linear least squares and the maximum likelihood estimators Chapter 2 discusses the construction of confidence regions

Trang 5

in the non-linear regression model and the asymptotic properties of the maximum likelihood estimator (but not of the non-linear least squares estimator) Chapter 5 considers the Cobb-Douglas production function with both multiplicative and additive errors, and Chapter 8 considers non-linear (only in variables) simultaneous equations models There are two noteworthy survey articles: Gallant (1975a), with emphasis on testing and computation, and Bunke, Henscheke, Strtiby and Wisotzki (1977), which is more theoretically oriented None of the above-mentioned references, however, discusses the estimation of simultaneous equations models non-linear both in variables and parameters

2.1 Model

In this section I consider the standard non-linear regression model

where y, is a scalar endogenous variable, x, is a vector of exogenous variables, &

is a K-vector of unknown parameters, and {u,} are unobservable scalar i.i.d random variables with Eu, = 0 and Vu, = ut, another unknown parameter Note that, unlike the linear model wheref(x,, &) = x&, the dimensions of the vectors

x, and &-, are not necessarily the same We will assume that f is twice continuously differentiable As for the other assumptions on f, I will mention them as they are required for obtaining various results in the course of the subsequent discussion Econometric examples of (2.1) include the Cobb-Douglas production function with an additive error,

Trang 6

The non-linear least squares (NLLS) estimator, denoted p, is defined as the value of /I that minimizes the sum of squared residuals

It is important to distinguish between the p that appears in (2.5), which is the argument of the function f(x,, m), and &, which is a fixed true value In what follows, I will discuss the properties of p, the method of computation, and statistical inference based on 8

The consistency of b is proved by proving that plim T- ‘S,( j3) is minimized at the true value & Strong consistency is proved by showing the same holds for the almost sure limit of T- ‘S,( /3) instead This method of proof can be used to prove the consistency of any other type of estimator which is obtained by either minimizing or maximizing a random function over the parameter space For example, I used the same method to prove the strong consistency of the maximum likelihood estimator (MLE) of the Tobit model in Amemiya (1973b)

This method of proof is intuitively appealing because it seems obvious that if

T-l&( /3) is close to plim T-k&(/3) and if the latter is minimized at &,, then fi, which minimizes the former, should be close to & However, we need the following three assumptions in order for the proof to work:

The parameter space B is compact (closed and bounded)

and & is its interior point (2.6)

plim T- ‘S,(p) exists, is non-stochastic, and its convergence is uniform in p

(2.8)

Trang 7

338 T Amemiya

The meaning of (2.8) is as follows Define S(p) = plim T- ‘S,( j3) Then, given

E, S > 0, there exists To, independent of /I, such that for all T 2 To and for all

T-‘C[f,(&)- f,(p)]’ by Chebyshev’s inequality:

Since the uniform convergence of A, follows from the uniform convergence of the right-hand side of (2.10), it suffices to assume

converges uniformly in fi, , & E B (2.11) Having thus disposed of A, and A,, we need only to assume that lim A, is uniquely minimized at PO; namely,

To sum up, the non-linear least squares estimator B of the model (2.1) is consistent if (2.6), (2.1 l), and (2112) are satisfied I will comment on the signifi- cance and the plausibility of these three assumptions

The assumption of a compact parameter space (2.6) is convenient but can be rather easily removed The trick is to dispose of the region outside a certain compact subset of the parameter space by assuming that in that region

T-‘~MPoF.MP)12 IS * sufficiently large This is done by Malinvaud (1970a)

An essentially similar argument appears also in Wald (1949) in the proof of the consistency of the maximum likelihood estimator

It would be nice if assumption (2.11) could be paraphrased into separate assumptions on the functional form off and on the properties of the exogenous

Trang 8

sequence {x,}, which are easily verifiable Several authors have attempted to obtain such assumptions Jennrich (1969) observes that if f is bounded and continuous, (2.11) is implied by the assumption that the empirical distribution function of {x,} converges to a distribution function He also notes that another

way to satisfy (2.11) is to assume that {x,} are i.i.d with a distribution function F,

and f is bounded uniformly in p by a function which is square integrable with

respect to F Malinvaud (1970a) generalizes the first idea of Jennrich by introduc-

ing the concept of weak convergence of measure, whereas Gallant (1977) generalizes the second idea of Jennrich by considering the notion of Cesaro summabil- ity However, it seems to me that the best procedure is to leave (2.11) as it is and try to verify it directly

The assumption (2.12) is comparable to the familiar assumption in the linear

model that lim T- ‘X’X exists and is positive definite It can be easily proved that

in the linear model the above assumption is not necessary for the consistency of least squares and it is sufficient to assume (X’X)- ’ + 0 This observation suggests that assumption (2.12) can be relaxed in an analogous way One such result can be found in Wu (198 1)

2.2.2 Asymptotic normality

The asymptotic normality of the NLLS estimator B is rigorously proved in Jennrich (1969) Again, I will give a sketch of the proof, explaining the required assumptions as I go along, rather than reproducing Jennrich’s result in a theorem-proof format

The asymptotic normality of the NLLS estimator, as in the case of the MLE, can be derived from the following Taylor expansion:

(2.13)

where a2$/apap’ is a K x K matrix of second-order derivatives and p* lies

between j? and & To be able to write down (2.13), we must assume that f, is twice continuously differentiable with respect to p Since the left-hand side of (2.13) is zero (because B minimizes S,), from (2.13) we obtain:

Twl,.]‘$ %I,,- (2.14)

Thus, we are done if we can show that (i) the limit distribution of

fi-‘(asT/a&&, is normal and (ii) T- ‘( 6’2ST/apap’)B* converges in probabil-

ity to a non-singular matrix We will consider these two statements in turn

Trang 9

The proof of statement (i) is straightforward Differentiating (2.5) with respect

Proving (ii) poses a more difficult problem Write an element of the matrix

~-l(a~s,/apap)~ ash@*) 0 ne might think that plim hT( /3*) = plim hT( &,) follows from the well-known theorem which says that the probability limit of a continuous function is the function of the probability limit, but the theorem does not apply because h, is in general a function of an increasing number of random variables y,, j2, ,y, But, by a slight modification of lemma 4, p 1003, of Amemiya (1973b), we can show that if hr( p) converges almost surely to a certain non-stochastic function h( /?) uniformly in p, then plim hT( p*) = h(plim /I*) = h( &) Differentiating (2.15) again with respect to p and dividing by T yields

(2.19)

We must show that each of the three terms in the right-hand side of (2.19)

Trang 10

converges almost surely to a non-stochastic function uniformly in p For this purpose the following assumptions will suffice:

converges uniformly in /I in an open neighborhood of /3,, ,

and

(2.20)

converges uniformly in p in an open neighborhood of &

(2.21) Then, we obtain;

It is worth pointing out that in the process of proving (2.23) we have in effect shown that we have, asymptotically,

(2.24) where I have put G = ( af/&3’),0, a F x K matrix Note that (2.24) exactly holds

in the linear case The practical consequence of the approximation (2.24) is that all the results for the linear regression model are asymptotically valid for the non-linear regression model if we treat G as the regressor matrix In particular, we

can use the usual t and F statistics with an approximate precision, as I will

explain more fully in Sections 2.4 and 2.5 below Since the matrix G depends on the unknown parameters, we must in practice evaluate it at b

2.3 Computation

Since there is in general no explicit formula for the NLLS estimator b, the minimization of (2.5) must usually be carried out by some iterative method There

Trang 11

342 T Amemiya

are two general types of iteration methods: general optimization methods applied

to the non-linear least squares problem in particular, and procedures which are specifically designed to cope with the present problem In this chapter I will discuss two representative methods - the Newton-Raphson iteration which belongs to the first type and the Gauss-Newton iteration which belongs to the second type - and a few major variants of each method These cover a majority of the iterative methods currently used in econometric applications Although not discussed here, I should mention another method sometimes used in econometric applications, namely the so-called conjugate gradient method of Powell (1964) which does not require the calculation of derivatives and is based on a different principle from the Newton methods Much more detailed discussion of these and other methods can be found in Chapter 12 of this Handbook and in Goldfeld and Quandt (1972, ch 1)

2.3.1 Newton - Raphson iteration

The Newton-Raphson method is based on the following quadratic approximation of a minimand (it also works for a maximand):

(2.25)

where B, is the initial estimate [obtained by a pure guess or by a method such as the one proposed by Hartley and Booker (1965) described below] The second- round estimator & of the iteration is obtained by minimizing the right-hand side

Trang 12

The second weakness may be remedied by the modification:

(2.29)

where the scalar X, is to be appropriately determined See Fletcher and Powell (1963) for a method to determine h, by a cubic interpolation of S,(p) along the current search direction [This method is called the DFP iteration since Fletcher and Powell refined the method originally proposed by Davidon (1959).] Also, see Berndt, Hall, Hall and Hausman (1974) for another method to choose A, Ordinarily, the iteration (2.26) is to be repeated until convergence takes place However, if B, is a consistent estimator of & such that @(b, - &,) has a proper limit distribution, the second-round estimator 8, has the same asymptotic distribution as B In this case, a further iteration does not bring any improvement so far as the asymptotic distribution is concerned This is shown below

By a Taylor expansion of (a&/a/3);, around &, we obtain:

Trang 13

But, under the assumptions of Section 2.2 from which we proved the asymptotic normality of b, we have

Therefore,

(2.32)

(2.33) where y means that both sides of the equation have the same non-degenerate limit distribution

To start an iteration, we need an initial estimate Since there may be more than one local minima in ST, it is helpful to use the starting value as close to the true value as possible Thus, it would abe desirable to have available an easily computable good estimator, such as p,; all the better if it is consistent so that we can take advantage of the result of the preceding paragraph Surprisingly, I know only one such estimator - the one proposed by Hartley and Booker (1965) Their initial estimator is obtained as follows Let us assume for simplicity mK = T for

some integer m and partition the set of integers (1,2, , T) into K non-overlap- ping consecutive subsets !P,, !P2;, , !PK, each of which contains m elements If we define j$, =m-'ClsuyI and f&(/3)=m-LC,,yfi(/3), i=l,2, ,K, the Harley-Booker estimator is defined as the value of b that satisfies K equations:

Y(i)=(i)(P), i=1,2 ,**., K (2.34)

Since (2.34) cannot generally be solved explicitly for p, one still needs an iteration to solve it Hartley and Booker propose the minimization of EYE ,[ jjCi, - fCi,(p)12 by an iterative method, such as one of the methods being discussed in this section This minimization is at least simpler than the original minimization

of (2.5) because the knowledge that the minimand is zero at /3 = & is useful However, if there are multiple solutions to (2.34), an iteration may lead to the wrong solution

Hartley and Booker proved the consistency of their estimator Jennrich (1969) gave a counterexample to their consistency proof; however, their proof can easily

be modified to take account of Jennrich’s counter-example A more serious weakness of the Hartley-Booker proof is that their assumptions are too restric- tive: one can easily construct a benign example for which their assumptions are violated and yet their estimator is consistent

Trang 14

Gallant (1975a) suggested a simpler variation of the Hartley-Booker idea: just select K observations appropriately and solve them for p This estimator is simpler to compute, but inconsistent Nevertheless, one may obtain a good starting value by this method, as Gallant’s example shows

The Gauss-Newton iteration may be alternatively motivated as follows Evaluating the approximation (2.35) at &, and inserting it into eq (2.1) yields

(2.38)

Then, the second-round estimator b2 can be obtained as the least squares estimator of /3,, applied to the linear regression equation (2.38), where the whole left-hand side is treated as the dependent variable and (af,/J/3’);, as the vector

of independent variables Eq (2.38) reminds us of the point raised above: namely, the non-linear regression model asymptotically behaves like the linear regression model if we treat (af/&‘fi’)j as the regressor matrix

Trang 15

b, it converges to b See Tomheim (1963) for an alternative proof of the convergence of the Hartley iteration Some useful comments on Marquardt’s and Hartley’s algorithms can be found in Gallant (1975a) The methods of determin- ing A, in the Newton-Raphson iteration (2.29) mentioned above can be also applied to the determination of A,, in (2.41)

Jennrich (1969) proves that if the Gauss-Newton iteration is started at a point sufficiently close to the true value &, and if the sample size T is sufficiently large, the iteration converges to &, This is called the asymptotic stability of the iteration The following is a brief sketch of Jenmich’s proof Rewrite the Gauss- Newton iteration (2.37) as (I have also changed 1 to n and 2 to n + 1 in the subscript)

where h is a vector-valued function implicitly defined by (2.37) By a Taylor

Trang 16

Ch 6: Non -linear Regression Models 341

expansion:

(2.43)

where /3:_ , lies between &, and &_ , If we define A,, = (ah /a/3’),, and denote

the largest characteristic root of AkA, by h,, we can show that A, + 0 almost surely for all n as T + 00 and hence

h, + 0 almost surely for all n as T + CO (2.W

But (2.44) implies two facts First, the iteration converges to a stationary point, and secondly, this stationary point must lie sufficiently close to the starting value

8, since

(,i~-p,)l(~~ ,)~S’6(1+h,+hlXz+ *- +h,X,.+_*), (2.45) where 6 = & - 8, Therefore, this stationary point must be B if 8, is within a neighborhood of & and if b is the unique stationary point in the same neighborhood

In closing this section I will mention several empirical papers in which the above-mentioned and related iterative methods are used Bodkin and Klein (1967) estimated the Cobb-Douglas (2.2) and the CES (2.3) production functions by the Newton-Raphson method Charatsis (1971) estimated the CES production function by a modification of the Gauss-Newton method similar to that of Hartley (1961) and showed that in 64 samples out of 74, it converged in six iterations Mizon (1977), in a paper the major aim of which was to choose among nine production functions, including the Cobb-Douglas and CES, used the conjugate gradient method of Powell (1964) Miion’s article is a useful compendium on the econometric application of various statistical techniques such as sequential testing, Cox’s test of separate families of hypotheses [Cox (1961, 1962)], the Akaike Information Criterion [Akaike (1973)], the Box-Cox transformation [Box and Cox (1964)], and comparison of the likelihood ratio, Wald, and Lagrange multi- plier tests (see the end of Section 2.4 below) Sargent (1978) estimates a rational expectations model (which gives rise to non-linear constraints among parameters)

by the DFP algorithm mentioned above

2.4 Tests of hypotheses

In this section I consider tests of hypotheses on the regression parameters p It is useful to classify situations into four cases depending on the nature of the

Trang 17

348

Table 2.1 Four cases of hypotheses tests

- Non-normal

hypotheses and the distribution of the error term as depicted in Table 2.1 I will

discuss the t and F tests in Case I and the likelihood ratio, Wald, and Rao tests in

Case IV I will not discuss Cases II and III because the results in Case IV are a fortiori valid in Cases II and III

2.4.1 Linear hypotheses under normality

Partition the parameter vector as fi’ = (&,,, &), where &,, is a K,-vector and &)

is a K,-vector By a linear hypothesis I mean a hypothesis which specifies that &)

is equal to a certain known value PC*) Student’s t test is applicable if K, = 1 and theFtestifK,>l

The hypothesis of the form Q/3 = c, where Q is a known K, X K matrix and c is

a known K,-vector, can be transformed into a hypothesis of the form described above and therefore need not be separately considered Assuming Q is full rank,

we can find a K, X K matrix R such that (R’, Q’) = A’ is non-singular If we

define (Y = A/3 and partition (Y’ = (‘Y;,), (Y{*)), the hypothesis Q/3 = c is equivalent

to the hypothesis a(Z) = c

As noted after eq (2.24), all the results of the linear regression model can be extended to the non-linear model by treating G = ( af/&3’),0 as the regressor matrix if the assumptions of Section 2.2 are satisfied Since &, is unknown, we must use G = (af/ap)j in practice We will generalize the t and F statistics of

the linear model by this principle If K, = 1, we have approximately

-qK($z,-%J _ t(T_ K)

where L! is the last diagonal element (if &) is the i th element of p, the i th diagonal

element) of (&G)-’ and t(T- K) denotes Student’s t distribution with T - K

degrees of freedom For the case K, 2 1, we have asymptotically under the null hypothesis:

Trang 18

In testing & = p(Z) when K, 2 1, we may alternatively use the asymptotic approximation (under the null hypothesis):

The study of Gallant (1975~) sheds some light on the choice between (2.47) and (2.49) He obtained the asymptotic distribution of the statistics (2.47) and (2.49) under the alternative hypothesis as follows Regarding S,(b), which appears in both formulae, we have asymptotically:

Ilf(PckN3~,,~ 42))l12 wll

in which /?& is the value of &, that minimizes

(2.52)

Trang 19

where fit2j0 is the true value of &.’ The asymptotic distribution of the statistic (2.47) under the alternative hypothesis can now be derived from (2.50) and (2.52) and, similarly, that of (2.49) from (2.50) and (2.5 1)

Gallant (1975~) conducted a Monte Carlo study using the model (2.48) to compare the above two tests in testing p, = 0 against p, t 0 and & = - 1 against

& * - 1 His results show that (i) the asymptotic approximation under the alternative hypothesis matches the empirical distribution reasonably well for both statistics but works a little better for the statistic (2.49) and (ii) the power of (2.49) tends to be higher than that of (2.47).2 Gallant (1975a) observes that (2.49) is easier to calculate than (2.47) except when K, = 1 All these observations indicate

a preference for (2.49) over (2.47) See Gallant (1975b) for a tabulation of the power function of the test based on S,(/?‘)/S,(&, which is equivalent to the test based on (2.49)

2.4.2 Non -linear hypotheses under non -normality

Now I consider the test of a non-linear hypothesis

where h is a q-vector valued non-linear function such that q < K

If /3 are the parameters that characterize a concentrated likelihood function L(p), where L may or may not be derived from the normal distribution, we can test the hypothesis (2.53) using one of the following well-known test statistics: the likelihood ratio test (LRT), Wald’s test [WaId (1943)], or Rao’s test [Rao (1947)]:

*Actually, the powers of the two tests calculated either from the approximation or from the

Trang 20

351

where B is the unconstrained maximum likelihood estimator and /? is the constrained maximum likelihood estimator obtained maximizing L(p) subject to (2.53).3 By a slight modification of the proof of Rao (1973) (a modification is necessary since Rao deals with a likelihood function rather than a concentrated likelihood function), it can be shown that all the three test statistics have the same limit distribution- x*(q), &i-square with q degrees of freedom For more discussion of these tests, see Chapter 13 of this Handbook by Engle

Gallant and Holly (1980) obtained the asymptotic distribution of the three statistics under an alternative hypothesis in a non-linear simultaneous equations model Translated into the present simpler model, their results can be stated as follows As in Gallant (1975~) (see footnote l), they assume that the “distance” between the null hypothesis and the alternative hypothesis is small: or, more precisely, that there exists a sequence of true values {p,‘} such that 6 = limo (PO’- &) is finite and h(&) = 0 Then, statistics (2.54), (2.55), and (2.56) converge to x*(q, A), &i-square with q degrees of freedom and the noncentrality parameter h,4 where

If we assume the normality of u in the non-linear regression model (2 l), we can write (2.54), (2.55), and (2.56) as5

41f 6 is distributed as a q-vector N(0, V), then (5 + a)‘V-‘(6 + p) - x*(q,p’V- p)

‘In the following derivation I have omitted some terms whose probability limit is zero in evaluating

Trang 21

352 T Amemiya

using a proof similar to Rao’s, we can show that the statistics (2.58), (2.59) and (2.60) are asymptotically distributed as x’(q) even if u are not normal Thus, these statistics can be used to test a non&near hypothesis under a

situation

In the linear regression model we can show Wald 2 LRT 2 Rao

and Savin (1977)] Although the inequalities do not exactly hold for

ear model, Mizon (1977) found Wald 2 LRT most of the time in his

2.5 Confidence regions

Confidence regions on the parameter vector p or its subset can be constructed using any of the test statistics considered in the preceding section In this section I discuss some of these as well as other methods of constructing confidence regions

A 100 X (1 - (u) percent confidence interval on an element of p can be obtained from (2.46) as

non-normal [see Bemdt the non-lin- samples

(2.61)

where t a/2( T - K) is the a/2 critical value of t( T - K)

A confidence region - 100 x (1 - a) percent throughout this section- on the whole vector j3 can be constructed using either (2.47) or (2.49) If we use (2.47) we obtain:

(2.62)

(2.63)

Beale (1960) shows that the confidence region based on (2.63) gives an accurate result - that is, the distribution of the left-hand side of (2.63) is close to F( K, T - K)- if the “non-linearity” of the model is small He defines a measure of

Trang 22

in applying Beale’s measure of non-linearity to real data, observe that fi is a useful measure if the degree of “true non-linearity” (which can be measured by the population counterpart of Beale’s measure) is small Also, see Bates and Watts (1980) for a further development

The standard confidence ellipsoid in the linear regression model can be written

(T-K)(y-f)‘Z(Z’Z)-‘Z’(y-f) <F(K T-K)

K(y-f)t[I-Z(Z’Z)-‘Z’](y-f) a ’ ’ (2.66)

where 2 is an appropriately chosen T X K matrix of constants with rank K The computation of (2.66) is more difficult than that of (2.65) because p appears in both the numerator and denominator of (2.66) In a simple model where f,( /3) =

P, + Pse% Hartley suggests choosing Z such that its tth row is equal to (1, x,, xf ) This suggestion may be extended to a general recommendation that we should choose the column vectors of Z to be those independent variables which

we believe best approximate G Although the distribution of the left-hand side of (2.66) is exactly F(K, T - K) for any Z under the null hypothesis, its power depends crucially on the choice of Z

Trang 23

354 T Amemiya

3 Single equation-non4.i.d case

In this section we consider the non-linear regression model (2.1) where {u,} follow

a general stationary process

the spectral density g(o) of ( ut} is continuous (3.3)

I will add whatever assumptions are needed in the course of the subsequent discussion The variance-covariance matrix Euu’ will be denoted by 2

I will indicate how to prove the consistency and the asymptotic normality of the non-linear least squares estimator B in the present model, given the above assumptions as well as the assumptions of Section 2.2 Changing the assumption

of independence to autocorrelation poses no more difficulties in the non-linear model than in the linear model

To prove consistency, we consider (2.9) as before Since A, does not depend on

p and A, does not depend on u,, we need to be concerned with only A, Since A,

involves the vector product f’u and since E(f’u)* = f’Zf $ f’fx,(Z), where

h,(E) is the largest characteristic root of E, assumption (2.11) implies plim A, = 0

by Chebyshev’s inequality, provided that the characteristic roots of 2 are bounded from above But this last condition is implied by assumption (3.3)

To prove the asymptotic normality in the present case, we need only prove the asymptotic normality of (2.16) which, just as in the linear model, follows from theorem 10.2.11, page 585, of Anderson (1971) if we assume

j=O

in addition to all the other assumptions Thus,

~(B-Po)-~[0,0,21imT-‘(G’G)-‘G’~G(G’G)~’], (3.5)

Trang 24

Ch 6: Non -linear Regression Models 355

which indicates that the linear approximation (2.24) works for the autocorrelated model as well Again it is safe to say that all the results of the linear model are asymptotically valid in the non-linear model This suggests, for example, that the Durbin-Watson test will be approximately valid in the non-linear model, though this has not been rigorously demonstrated

Now, let us consider the non-linear analogue of the generalized least squares estimator, which I will call the non-linear generalized least squares (NLGLS) estimator

Hannan (1971) investigated the asymptotic properties of the class of estimators, denoted by p(A), obtained by minimizing ( y - f )‘A - ‘( y - f ) for some A, which

is the variance-covariance matrix of a stationary process with bounded (both from above and from below) characteristic roots This class contains the NLLS estimator, B = &I), and the NLGLS estimator, &E)

Hannan actually minimized an approximation of (y - f)‘A’(y - f) ex- pressed in the frequency domain; therefore, his estimator is analogous to his spectral estimator proposed for the linear regression model [Hannan (1963)] If we define the periodograms

: i+ytei’“Cf,eCitw,

f

(3.6)

2m 4lT w=o,- -, , 27r(T- 1)

we have approximately:

where C#I( w) is the spectral density associated with A This approximation is based

on an approximation of A by a circular matrix [See Amemiya and Fuller (1967,

p 527).]

Hannan proves the strong consistency of his non-linear spectral estimator obtained by minimizing the right-hand side of (3.7) under the assumptions (2.6),

Trang 25

(2.12), and the new assumption

f CfAcAf(r+s)(c*) converges uniformly in c, , c2 E B for every integer S

f

(3.8) Note that this is a generalization of the assumption (2.11) However, the assumption (3.8) is merely sufficient and not necessary Hannan shows that in the model

y, = OL, + (Y~COS& + cr,sin&t + u,, (3.9) assumption (3.8) does not hold and yet b is strongly consistent if we assume (3.4) and 0 < /?a < T In fact, T(fi - &) converges to zero almost surely in this case

In proving the asymptotic normality of his estimator, Hannan needs to generalize (2.20) and (2.21) as follows:

+c $i,, *I,, converges uniformly in c, and cZ

in an open neighborhood of & (3.10) and

converges uniformly in c, and c2

2

in an open neighborhood of &, (3.11)

He also needs an assumption comparable to (2.17), namely

lim+G’A-‘G( = A) exists and is non-singular (3.12)

Using (3.10), (3.1 l), and (3.12), as well as the assumptions needed for consistency, Hannan proves

where B = lim T- ‘G’A- '2X-'G If we define a matrix function F by

I aft af,,, n

- i=

we can write A = (2?r)-‘/Y,g(w)+(o)*dF(w) and B = (2a)-‘/l,+(o)dF(w)

Trang 26

357

In the model (3.9), assumptions (3.10) and (3.11) are not satisfied; nevertheless, Hannan shows that the asymptotic normality holds if one assumes (3.4) and

0 < & < 7r In fact, J?;T(b - &) , normal in this case

An interesting practical case is where I#B(W) = a)‘, where g(w) is a consistent estimator of g(o) I will denote this estimator by b(e) Harman proves that B(2) and b(Z) have the same asymptotic distribution if g(w) is a rational spectral density

Gallant and Goebel (1976) propose a NLGLS estimator of the autocorrelated model which is constructed in the time domain, unlike Hannan’s spectral estimator In their method, they try to take account of the autocorrelation of {u,} by fitting the least squares residuals ti, to an autoregressive model of a finite order Thus, their estimator is a non-linear analogue of the generalized least squares estimator analyzed in Amemiya (1973a)

The Gallant-Goebel estimator is calculated in the following steps (1) Obtain the NLLS estimator 8 (2) Calculate li = y - f(b) (3) Assume that (u,} follow an autoregressive model of a finite order and estimate the coefficients by the least squares regression of z?, on zi,_ , , zi,_ 2, (4) Let 2 be the variance-covariance matrix of u obtained under the assumption of an autoregressive model Then we can find a lower triangular matrix R such that 2-l = R'R, where R depends on the coefficients of the autoregressive model.6 Calculate i? using the estimates of the coefficients obtained in Step (3) above (5) Finally, minimize [&y - f)]’

[ R( y - f)] to obtain the Gallant-Goebel estimator

Gallant and Goebel conducted a Monte Carlo study of the model y, = &eS2xr + U, to compare the performance of the four estimators- the NLLS, the Gallant-Goebel AR1 (based on the assumption of a first-order autoregressive model), the Gallant-Goebel AR2, and Hannan’s b(2) - when the true distribution of (u,} is i.i.d., ARl, AR2, or MA4 (a fourth-order moving average process) Their major findings were as follows (1) The Gallant-Goebel AR2 was not much better than the AR1 version (2) The Gallant-Goebel estimators performed far better than the NLLS estimator and a little better than Hannan’s B(e), even when the true model was MA4- the situation most favorable to Hamran They think the reason for this is that in many situations an autoregressive model produces a better approximation of the true autocovariance function than the circular approximation upon which Hannan’s spectral estimator is based They

61f we assume a first-order autogressive model, for example, we obtain:

Trang 27

358 T Amemiya

illustrate this point by approximating the autocovariance function of the U.S wholesale price index by the two methods (3) The empirical distribution of the t statistic based on the Gallant-Goebel estimators was reasonably close to the theoretical distribution obtained under the pretense that the assumed model was the true model

For an application of a non-linear regression model with an autocorrelated error, see Glasbey (1979) who estimated a growth curve for the weight of a steer:

For his model White first considers the non-linear weighted least squares estimator which minimizes cW,( yr - f,)” = &r(P), where the weights (W,} are bounded constants

A major difference between his proof of consistency and the one employed in Section 2.2 is that he must account for the possibility that plim T- ‘Q,( 8) may not exist due to the heteroscedasticity of {u,} [See White (1980a, p 728) for an example of this.] Therefore, instead of proving that plim T-l&( j3) attains the minimum at & as done in Section 2.2, White proves that plim T- ‘[ QT( p)- EQT( /I)] = 0 and there exists T, such that for any neighborhood N( &)

min EQ,(P)- EQ,(&,) 1 > 0,

from which consistency follows

Another difference in his proof, which is necessitated by his assumption that (x,} are random variables, is that instead of using assumption (2.1 l), he appeals

to Hoadley’s (1971) strong law of large numbers, which essentially states that if {X,(p)} are independent random variables such that IX,(p)] 5 X;” for all /? and

EIX,?I’+” < 00 for some 6 > 0, then supPT-‘~T_ ,( X, - EX,)I converges to zero almost surely

Trang 28

Ch 6: Non - lineur Regression Models 359

White’s proof of asymptotic normality is a modification of the proof given in Section 2.2 and uses a multivariate version of Liapounov’s central limit theorem due to Hoadley (1971)

White shows that the results for non-stochastic weights, W,, hold also for stochastic weights, R:, provided that J$$ converges to W, uniformly in 1 almost surely The last assumption is satisfied, for example, if {I_?-‘} are used as stochastic weights in the stratified sample case mentioned above

Just and Pope (1978) consider a non-linear regression model where the variance

of the error term is a non-linear function of parameters:

This is a non-linear analogue of the model considered by Hildreth and Houck (1968) Generalizing Hildreth and Houck’s estimator, Just and Pope propose regressing a; on A,( CX) by NLLS

4 Multivariate models

In this section I consider the multivariate non-linear regression model

Yit=fi(Xit~Pill)+Uit~ i=1,2, , N, t=1,2 , , T (4.1) Sometimes I will write fi(x,,,&,) more simply as fir(&) or just fit* Defining N-vectors y,= (r,,, Y~~, ,_Y~~)I, f, = (flr,f2t, ,fNI)I, and u, = (~1~~ %, ,uM)

we can write (4.1) as

Yt =_mcJ+ ut, t=1,2 , , T, (4.2) where I have written the vector of all the unknown regression parameters as O,, allowing for the possibility that there are constraints among (&} Thus, if there is

no constraint, 0, = (&, &, , &,>‘ We assume that {u,} are i.i.d vector random variables with Eu, = 0 and Eu,uj = 2

The reader will immediately notice that this model is a non-linear analogue

of the well-known seemingly unrelated regressions (SUR) model proposed by Zellner (1962) The estimation of the parameters also can be carried out using the same iterative method (the Zellner iteration) used in the linear SUR model The Zellner iteration in the non-linear SUR model (4.2) can be defined as follows Let

&A) be the estimator obtained by minimizing

t=l

Trang 29

for some matrix A Let 4, be the &h-round estimator of the Zellner iteration Then,

The consistency and the asymptotic normality of &A) for a fixed A can be proved by a straight!ol;ward modification of the proofs of Section 2.2 Gallant (1975d) proves that e(Z) has the same asymptotic distribution as 8(z) if A$ is a consistent estimator of Z In particular, his result means that the second-round estimator t$ of the Zellner iteration (4.4) has the same asymptotic distribution as b(z)- a result analogous to the linear case Gallant also generalizes another well-known result in the linear SUR model and proves that the asymptotic distributions of &I) and d(e) are the same if {x,~} are the same for all i and

constraints among the (/3,).’

By a Monte Carlo study of a two-equation model, Gallant (1975d) finds that an estimate of the variance of the estimator calculated from the asymptotic formula tends to underestimate the true variance and recommends certain corrections

If u - N(0, z), the concentrated log-likelihood function can be written as

Tiêu đề	Non-linear regression models
Tác giả	T. Amemiya
Người hướng dẫn	M. D. Intriligator
Trường học	Stanford University
Chuyên ngành	Econometrics
Thể loại	survey
Năm xuất bản	1983
Thành phố	Stanford

Định dạng
Số trang	58
Dung lượng	2,91 MB