Handbook of Economic Forecasting part 53 pptx

A positive value for the Hold-Out R2indicates that the out-of-sample predictive performance for the estimated model is better than that afforded by the simple constant prediction provide

Trang 1

using this last step, but also because it facilitates a feasible computation of an approxi-mation to the cross-validated MSE

Although we touched on this issue only briefly above, it is now necessary to con-front head-on the challenges for cross-validation posed by models nonlinear in the parameters This challenge is that in order to compute exactly the cross-validated MSE associated with any given nonlinear model, one must compute the NLS parameter es-timates obtained by holding out each required validation block of observations There are roughly as many validation blocks as there are observations (thousands here) This multiplies by the number of validation blocks the difficulties presented by the conver-gence problems encountered in a single NLS optimization over the entire estimation data set Even if this did not present a logistical quagmire (which it surely does), this also requires a huge increase in the required computations (a factor of approximately

1700 here) Some means of approximating the cross-validated MSE is thus required Here we adopt the expedient of viewing the hidden unit coefficients obtained by the initial NLS on the estimation set as identifying potentially useful predictive transforms

of the underlying variables and hold these fixed in cross-validation Thus we only need

to re-compute the hidden-to-output coefficients by OLS for each validation block As mentioned above, this can be done in a highly computationally efficient manner using

Racine’s (1997)feasible block cross-validation method This might well result in overly optimistic cross-validated estimates of MSE, but without some such approximation, the exercise is not feasible (The exercise avoiding such approximations might be feasible

on a supercomputer, but, as we see shortly, this brute force NLS approach is dominated

by QuickNet, so the effort is not likely justified.)

Table 1reports a subset of the results for this first exercise Here we report two

sum-mary measures of goodness of fit: mean squared error (MSE) and r-squared (R2).

We report these measures for the estimation sample, the cross-validation sample (CV), and the hold-out sample (Hold-Out) For the estimation sample, R2is the stan-dard multiple correlation coefficient For the cross-validation sample, R2 is computed

as one minus the ratio of the cross-validated MSE to the estimation sample variance of the dependent variable For the hold-out sample, R2is computed as one minus the ratio

of the hold-out MSE to the hold-out sample variance of the dependent variable about

the estimation sample mean of the dependent variable Thus, we can observe negative

values for the CV and Hold-Out R2’s A positive value for the Hold-Out R2indicates that the out-of-sample predictive performance for the estimated model is better than that afforded by the simple constant prediction provided by the estimation sample mean of the dependent variable

FromTable 1we see that, as expected, the estimation R2is never very large, ranging from a low of about 0.0089 to a high of about 0.0315 For the full experiment, the great-est great-estimation sample R2is about 0.0647, occurring with 50 hidden units (not shown) This apparently good performance is belied by the uniformly negative CV R2’s Al-though the best CV R2 or MSE (indicated by “*”) identifies the model with the best Hold-Out R2 (indicated by “∧”), that is, the model with only linear predictors (zero

hidden units), this model has a negative Hold-Out R2, indicating that it does not even

Trang 2

Ch 9: Approximate Nonlinear Forecasting Methods 495

Table 1 S&P 500: Naive nonlinear least squares – Logistic

Summary goodness of fit

Hidden

units

Estimation

MSE

CV MSE

Hold-out MSE

Estimation R-squared

CV R-squared

Hold-out R-squared

0 1.67890 1.79932∗ 0.55548 0.00886 −0.06223 −0.03016 ∧,∗

10 1.66970 1.94597 0.56098 0.01429 −0.14880 −0.04037

11 1.64669 1.87287 0.58445 0.02788 −0.10565 −0.08390

12 1.65209 1.85557 0.55982 0.02469 −0.09544 −0.03822

13 1.64594 2.03215 0.56302 0.02832 −0.19968 −0.04415

14 1.64064 1.91624 0.58246 0.03145 −0.13125 −0.08020

15 1.64342 2.00411 0.57788 0.02981 −0.18313 −0.07170

16 1.65963 2.00244 0.57707 0.02024 −0.18214 −0.07021

17 1.65444 2.05466 0.58594 0.02330 −0.21297 −0.08665

18 1.64254 1.98832 0.60214 0.03033 −0.17381 −0.11670

19 1.65228 2.01295 0.59406 0.02458 −0.18835 −0.10172

20 1.64575 2.09084 0.60126 0.02843 −0.23432 −0.11506

perform as well as using the estimation sample mean as a predictor in the hold-out sample

This unimpressive prediction performance is entirely expected, given our earlier dis-cussion of the implications of the efficient market hypothesis, but what might not have been expected is the erratic behavior we see in the estimation sample MSEs We see that as we consider increasingly flexible models, we do not observe increasingly better in-sample fits Instead, the fit first improves for hidden units one and two, then wors-ens for hidden unit three, then at hidden units four and five improves dramatically, then worsens for hidden unit six, and so on, bouncing around here and there Such behavior will not be surprising to those with prior ANN experience, but it can be disconcerting

to those not previously inoculated

The erratic behavior we have just observed is in fact a direct consequence of the challenging nonconvexity of the NLS objective function induced by the nonlinearity in parameters of the ANN model, coupled with our choice of a new set of random starting values for the coefficients at each hidden unit addition This behavior directly reflects and illustrates the challenges posed by parameter nonlinearity pointed out earlier

Trang 3

Table 2 S&P 500: Modified nonlinear least squares – Logistic

Hidden

units

Estimation

MSE

CV MSE

Hold-out MSE

CV R-squared

Hold-out R-squared

0 1.67890 1.79932∗ 0.55548 0.00886 −0.06223 −0.03016 ∧,∗

This erratic estimation performance opens the possibility that the observed poor predictive performance could be due not to the inherent unpredictability of the target variable, but rather to the poor estimation job done by the brute force NLS approach We next investigate the consequences of using a modified NLS that is designed to eliminate this erratic behavior This modified NLS method picks initial values for the coefficients

at each stage in a manner designed to yield increasingly better in-sample fits as flexibil-ity increases We simply use as initial values the final values found for the coefficients in the previous stage and select new initial coefficients at random only for the new hidden unit added at that stage; this implements a simple homotopy method

We present the results of this next exercise inTable 2 Now we see that the in-sample MSE’s behave as expected, decreasing nicely as flexibility increases On the other hand, whereas our nạve brute force approach found a solution with only five hidden units delivering an estimation sample R2of 0.0293, this second approach requires 30 hidden units (not reported here) to achieve a comparable in-sample fit Once again we have the best CV performance occurring with zero hidden units, corresponding to the best (but negative) out-of-sample R2 Clearly, this modification to nạve brute force NLS does not resolve the question of whether the so far unimpressive results could be due to poor

Trang 4

Table 3 S&P 500: QuickNet – Logistic

Hidden

units

Estimation

MSE

CV MSE

Hold-out MSE

CV R-squared

Hold-out R-squared

11 1.57871 1.75054∗ 0.64341 0.06801 −0.03343 −0.19323∗

estimation performance, as the estimation performance of the nạve method is better, even if more erratic Can QuickNet provide a solution?

Table 3reports the results of applying QuickNet to our S&P 500 data, again with the logistic cdf activation function At each iteration of Step 1, we selected the best of

m= 500 candidate units and applied cross-validation using OLS, taking the hidden unit coefficients as given Here we see much better performance in the CV and estimation samples than we saw in either of the two NLS approaches The estimation sample MSEs decrease monotonically, as we should expect Further, we see CV MSE first decreasing and then increasing as one would like, identifying an optimal complexity of eleven hidden units for the nonlinear model The estimation sample R2for this CV-best model

is 0.0634, much better than the value of 0.0293 found by the CV-best model inTable 1, and the CV MSE is now 1.751, much better than the corresponding best CV MSE of 1.800 found inTable 1

Thus QuickNet does a much better job of fitting the data, in terms of both estima-tion and cross-validaestima-tion measures It is also much faster Apart from the computaestima-tion time required for cross-validation, which is comparable between the methods, Quick-Net required 30.90 seconds to arrive at its solution, whereas nạve NLS required 600.30

Trang 5

seconds and modified NLS required 561.46 seconds respectively to obtain inferior so-lutions in terms of estimation and cross-validated fit

Another interesting piece of evidence related to the flexibility of ANNs and the rela-tive fitting capabilities of the different methods applied here is that QuickNet delivered

a maximum estimation R2of 0.1727, compared to 0.0647 for nạve NLS and 0.0553 for modified NLS, with 50 hidden units (not shown) generating each of these values Comparing these and other results, it is clear that QuickNet rapidly delivers much better sample fits for given degrees of model complexity, just as it was designed to do

A serious difficulty remains, however: the CV-best model identified by QuickNet is not at all a good model for the hold-out data, performing quite poorly It is thus im-portant to warn that even with a principled attempt to avoid overfit via cross-validation, there is no guarantee that the CV-best model will perform well in real-world hold-out data One possible explanation for this is that, even with cross-validation, the sheer flex-ibility of ANNs somehow makes them prone to over-fitting the data, viewed from the perspective of pure hold-out data

Another strong possibility is that real world hold-out data can differ from the esti-mation (and thus cross-validation) data in important ways If the relationship between the target variable and its predictors changes between the estimation and hold-out data, then even if we have found a good prediction model using the estimation data, there

is no reason for that model to be useful on the hold-out data, where a different predic-tive relationship may hold A possible response to handling such situations is to proceed recursively for each out-of-sample observation, refitting the model as each new observa-tion becomes available For simplicity, we leave aside an investigaobserva-tion of such methods here

This example underscores the usefulness of an out-of-sample evaluation of predictive performance Our results illustrate that it can be quite dangerous to simply trust that the predictive relationship of interest is sufficiently stable to permit building a model useful for even a modest post-sample time frame

Below we investigate the behavior of our methods in a less ambiguous environment, using artificial data to ensure (1) that there is in fact a nonlinear relationship to be un-covered, and (2) that the predictive relationship in the hold-out data is identical to that in the estimation data Before turning to these results, however, we examine two alterna-tives to the standard logistic ANN applied so far The first alternative is a ridgelet ANN, and the second is a nonneural network method that uses the familiar algebraic polyno-mials The purpose of these experiments is to compare the standard ANN approach with

a promising but less familiar ANN method and to contrast the ANN approaches with a more familiar benchmark

InTable 4, we present an experiment identical to that ofTable 3, except that instead of using the standard logistic cdf activation function, we instead use the ridgelet activation function

(z) = D5φ(z)=−z5+ 10z3− 15zφ(z).

Trang 6

Table 4 S&P 500: QuickNet – Ridgelet

Hidden

units

Estimation

MSE

CV MSE

Hold-out MSE

CV R-squared

Hold-out R-squared

.

39 1.33741 1.64768∗ 0.88580 0.21046 0.02729 −0.64277∗

The choice of h = 5 is dictated by the fact that k = 10 for the present example As this is

a nonpolynomial analytic activation function, it is also GCR, so we may expect Quick-Net to perform well in sample We emphasize that we are simply performing QuickQuick-Net with a ridgelet activation function and are not implementing any estimation procedure specified by Candes The results given here thus do not necessarily put ridgelets in their best light, but are nevertheless of interest as they do indicate what can be achieved with some fairly simple procedures

ExaminingTable 4, we see results qualitatively similar to those for the logistic cdf ac-tivation function, but with the features noted there even more pronounced Specifically, the estimation sample fit improves with additional complexity, but even more quickly, suggesting that the ridgelets are even more successful at fitting the estimation sample

Trang 7

Table 5 S&P 500: QuickNet – Polynomial

Hidden

units

Estimation

MSE

CV MSE

Hold-out MSE

CV R-squared

Hold-out R-squared

0 1.67890 1.79932∗ 0.55548 0.00886 −0.06223 −0.03016 ∧,∗

data patterns The estimation sample R2 reaches a maximum of 0.2534 for 50 hidden units, an almost 50% increase over the best value for the logistic The best CV perfor-mance occurs with 39 hidden units, with a CV R2that is actually positive (0.0273) As good as this performance is on the estimation and CV data, however, it is quite bad on the hold-out data The Hold-out R2with 39 ridgelet units is−0.643, reinforcing our comments above about the possible mismatch between the estimation predictive rela-tionship and the importance of hold-out sample evaluation

In recent work,Hahn (1998)andHirano and Imbens (2001) have suggested using algebraic polynomials for nonparametric estimation of certain conditional expectations arising in the estimation of causal effects These polynomials thus represent a famil-iar and interesting benchmark against which to contrast our previous ANN results In

Table 5we report the results of nonlinear approximation using algebraic polynomials, performed in a manner analogous to QuickNet The estimation algorithm is identical,

except that instead of randomly choosing m candidate hidden units as before, we now randomly choose m candidate monomials from which to construct polynomials.

For concreteness and to control erratic behavior that can result from the use of poly-nomials of too high a degree, we restrict ourselves to polypoly-nomials of degree less than or

Trang 8

equal to 4 As before, we always include linear terms, so we randomly select candidate monomials of degree between 2 and 4 The candidates were chosen as follows First, we randomly selected the degree of the candidate monomial such that degrees 2, 3, and 4

had equal (1/3) probabilities of selection Let the randomly chosen degree be denoted d Then we randomly selected d indexes with replacement from the set {1, , 9} and

con-structed the candidate monomial by multiplying together the variables corresponding to the selected indexes

The results ofTable 5are interesting in several respects First, we see that although the estimation fits improve as additional terms are added, the improvement is nowhere near as rapid as it is for the ANN approaches Even with 50 terms, the estimation R2only reaches 0.1422 (not shown) Most striking, however, is the extremely erratic behavior

of the CV MSE This bounces around, but generally trends up, reaching values as high

as 41 As a consequence, the CV MSE ends up identifying the simple linear model as best, with its negative Hold-out R2 The erratic behavior of the CV MSE is traceable to extreme variation in the distributions of the included monomials (Standard deviations can range from 2 to 150; moreover, simple rescaling cannot cure the problem, as the associated regression coefficients essentially undo any rescaling.) This variation causes the OLS estimates, which are highly sensitive to leverage points, to vary wildly in the cross-validation exercise, creating large CV errors and effectively rendering CV MSE useless as an indicator of which polynomial model to select

Our experiments so far have revealed some interesting properties of our methods, but because of the extremely challenging real-world forecasting environment to which they have been applied, we have not really been able to observe anything of their relative forecasting ability To investigate the behavior of our methods in a more controlled environment, we now discuss a second set of experiments using artificial data in which

we ensure (1) that there is in fact a nonlinear relationship to be uncovered, and (2) that the predictive relationship in the hold-out data is identical to that in the estimation data

We achieve these goals by generating artificial estimation data according to the non-linear relationship

Y∗

t = af q

X t , θ∗

q

+ 0.1ε t

, with q = 4, where X t = (Y t−1, Y t−2, Y t−3, |Y t−1|, |Y t−2|, |Y t−3|, R t−1, R t−2, R t−3),

as in the original estimation data (note that X t contains lags of the original Y t and not

lags of Y∗

t ) In particular, we take to be the logistic cdf and set

f q

x, θ∗

q

= xα∗

q

j=1

xγ∗

j

β∗

qj ,

where εt = Y t − f q(x, θ∗

q ), and with θ∗

qobtained by applying QuickNet (logistic) to the

original estimation data with four hidden units We choose a to ensure that Y∗

t exhibits the same unconditional standard deviation in the simulated data as it does in the actual data The result is an artificial series of returns that contains an “amplified” nonlinear

signal relative to the noise constituted by εt We generate hold-out data according to the

Trang 9

Table 6 Artificial data: Ideal specification

Hidden

units

Estimation

MSE

CV MSE

Hold-out MSE

CV R-squared

Hold-out R-squared

4 0.43081 0.45147∗ 0.45279 0.74567 0.73348 0.57439 ∧,∗

same relationship using the actual Xt ’s, but now with εtgenerated as i.i.d normal with mean zero and standard deviation equal to that of the errors in the estimation sample The maximum possible hold-out sample R2turns out to be 0.574, which occurs when the model uses precisely the right set of coefficients for each of the four hidden units The relationship is decidedly nonlinear, as using a linear predictor alone delivers a Hold-Out R2 of only 0.0667 The results of applying the precisely right hidden units are presented inTable 6

First we apply nạve NLS to these data, parallel to the results discussed ofTable 1 Again we choose initial values for the coefficients at random Given that the ideal hid-den unit coefficients are located in a 40-dimensional space, there is little likelihood

of stumbling upon these, so even though the model is in principle correctly specified for specifications with four or more hidden units, whatever results we obtain must be viewed as an approximation to an unknown nonlinear predictive relationship

We report our nạve NLS results inTable 7 Here we again see the bouncing pattern

of in-sample MSEs first seen inTable 1, but now the CV-best model containing eight hidden units also identifies a model that has locally superior hold-out sample perfor-mance For the CV-best model, the estimation sample R2is 0.6228, the CV sample R2

is 0.5405, and the Hold-Out R2is 0.3914 We also include inTable 7the model that has the best Hold-Out R2, which has 49 hidden units For this model the Hold-Out R2is 0.4700; however, the CV sample R2is only 0.1750, so this even better model would not have appeared as a viable candidate Despite this, these results are encouraging, in that now the ANN model identifies and delivers rather good predictive performance, both in and out of sample

Table 8 displays the results using the modified NLS procedure parallel toTable 2 Now the estimation sample MSEs decline monotonically, but the CV MSEs never approach those seen inTable 7 The best CV R2 is 0.4072, which corresponds to a Hold-Out R2of 0.286 The best Hold-Out R2of 0.3879 occurs with 41 hidden units, but again this would not have appeared as a viable candidate, as the corresponding CV

R2is only 0.3251

Trang 10

Table 7 Artificial data: Naive nonlinear least squares – Logistic

Hidden

units

Estimation

MSE

CV MSE

Hold-out MSE

CV R-squared

Hold-out R-squared

.

Next we examine the results obtained by QuickNet, parallel to the results ofTable 3

InTable 9we observe quite encouraging performance The CV-best configuration has

33 hidden units, with a CV R2of 0.6484 and corresponding Hold-Out R2of 0.5430 This is quite close to the maximum possible value of 0.574 obtained by using precisely the right hidden units Further, the true best hold-out performance has a Hold-Out R2of 0.5510 using 49 hidden units, not much different from that of the CV-best model The corresponding CV R2is 0.6215, also not much different from that observed for the CV best model

The required estimation time for QuickNet here is essentially identical to that re-ported above (about 31 seconds), but now nạve NLS takes 788.27 seconds and modified NLS requires 726.10 seconds

InTable 10, we report the results of applying QuickNet with a ridgelet activation function Given that the ridgelet basis is less smooth relative to our target function than the standard logistic ANN, which is ideally smooth in this sense, we should not expect

of

Tiêu đề	Approximate Nonlinear Forecasting Methods
Tác giả	H. White
Trường học	Not Available
Chuyên ngành	Economic Forecasting
Thể loại	Thesis
Năm xuất bản	Not Available
Thành phố	Not Available

Định dạng
Số trang	10
Dung lượng	90,67 KB