The data used are from public funds, private funds, and commodity trading advisors CTAs.. The mean returns presented in Table 3.1 show CTA returns are higher than those of public or priv
Trang 1Performance of Managed Futures: Persistence and the
Source of Returns
B Wade Brorsen and John P Townsend
Managed futures investments are shown to exhibit a small amount of per-formance persistence Thus, there do appear to be some differences in the skills of commodity trading advisors The funds with the highest returns used long-term trading systems, charged higher fees, and had fewer dollars under management
Returns were negatively correlated with the most recent past returns, but the sum of all correlations was positive Consistent with work in behav-ioral finance, when deciding whether to invest or withdraw funds, investors put the most weight on the most recent returns The results suggest that the source of futures fund returns is exploiting inefficiencies
INTRODUCTION
There is little evidence from past research that the top performing managed futures funds can be predicted (Schwager 1996) Past literature has prima-rily used variations of the methods of Elton, Gruber, and Rentzler (EGR) Yet EGR’s methods have little power to reject the null hypothesis of no pre-dictability (Grossman 1987) Using methods with sufficient power to reject
a false null hypothesis, this research seeks to determine whether perform-ance persists for managed futures advisors The data used are from public funds, private funds, and commodity trading advisors (CTAs) Regression analysis is used to determine whether all funds have the same mean returns This is done after adjusting for changes in overall returns and differences in leverage Monte Carlo methods are used to determine the power of EGR’s
Trang 2methods Then an out-of-sample test similar to that of EGR is used over longer time periods to achieve greater power Because some performance persistence is found, we explain the sources of this performance persistence using regressions of (1) returns against CTA characteristics, (2) return risk against CTA characteristics, (3) returns against lagged returns, and (4) changes in investment against lagged returns
DATA
LaPorte Asset Allocation provided the data, much of which originated from Managed Accounts Reports The CTA data include information on CTAs
no longer trading as well as CTAs who are still trading The data include monthly returns from 1978 to 1994 Missing values were deleted by delet-ing observations where returns and net asset value were zero This should help prevent deleting observations where returns were truly zero The return data were converted to log changes,1 so they can be interpreted as percentage changes in continuous time
The mean returns presented in Table 3.1 show CTA returns are higher than those of public or private returns This result is consistent with those
1The formula used was r it = ln (1 + d it/100)× 100, where, d itis the discrete time return The adjustment factor of 100 is used since the data are measured as percentages.
TABLE 3.1 Descriptive Statistics for the Public, Private, and Combined CTA Data Sets and Continuous Time Returns
Combined Statistic Public Funds Private Funds CTAs
Percentage returns
Trang 3in previous literature The conventional wisdom as to why CTAs have higher returns is that they incur lower costs However, CTA returns may be higher because of selectivity or reporting biases Selectivity bias is not a major concern here, because the comparison is among CTAs, not between CTAs and some other investment Faff and Hallahan (2001) argue that sur-vivorship bias is more likely to cause performance reversals than perform-ance persistence The data used show considerable kurtosis (see Table 3.1) However, this kurtosis may be caused by heteroskedasticity (returns of some funds are more variable than others)
REGRESSION TEST OF PERFORMANCE PERSISTENCE
To measure performance persistence, a model of the stochastic process that generates returns is required The process considered is:
(3.1)
where r it = return of fund (or CTA) i in month t
r = average fund returns in month t t
slope parameter b i= differences in leverage
The model allows each fund to have a different variance, which is consis-tent with past research We also considered models that assumed that b iis zero, with either fixed effects (dummy variables) for time or random effects instead These changes to the model did not result in changes in the con-clusions about performance persistence
Only funds/CTAs with at least three observations are included The model is estimated using feasible generalized least squares The null hypoth-esis considered is that all funds have the same mean returns, provided that adjustments have been made for changes in overall returns and differences
in leverage This is equivalent to testing the null hypothesis H0:a i = a where
a is an unknown constant
Analysis of variance (ANOVA) results in Table 3.2 consistently show that some funds and pools have different mean returns than others This finding does contrast with previous research, but is not really surprising given that funds and pools have different costs Funds and pools have dif-ferent trading systems, and commodities traded vary widely The test used
in this study measures long-term performance persistence; in contrast, EGR measures short-term performance persistence
N
it
=α + β +ε = =
ε i σ2
0
, , , , ,
~ ( , )
K and K
i i
Trang 4Only about 2 to 4 percent of the variation in monthly returns across funds can be explained by differences in individual means Because the pre-dictable portion is small, precise methods are needed to find it Without the correction for heteroskedasticity, the null hypothesis would not have been rejected with the public pool data Even though the predictability is low, it
is economically significant The standard deviations in Table 3.2 are large, implying that 2 to 4 percent of the standard deviation is about 50% of the mean Thus, even though there is considerable noise, there is still potential
to use past returns to predict future returns
As shown in Table 3.3, the null hypothesis that each fund has the same variance was rejected This is consistent with previous research that shows some funds or CTAs have more variable returns than others The rescaled residuals have no skewness, and the kurtosis is greatly reduced The
TABLE 3.2 Weighted ANOVA Table: Returns Regression for Public Funds,
Private Funds, and Combined CTA Data
Combined Statistic Public Funds Private Funds CTAs
Sum of squared errors
R2 0.48 0.35 0.31
F-statistics
TABLE 3.3 F-Statistics for the Test of Homoskedasticity Assumption
and Jarque-Bera Test of Normality of Rescaled Residuals
Combined Statistic Public Funds Private Funds CTAs
Trang 5rescaled residuals have a t-distribution so some kurtosis should remain
even if the data were generated from a normal distribution This demon-strates that most of the nonnormality shown in Table 3.1 is due to heteroskedasticity
MONTE CARLO STUDY
In their method, EGR ranked funds by their mean return or modified Sharpe ratio in a first period, and then determined whether the funds that ranked high in the first period also ranked high in the second period We use Monte Carlo simulation to determine the power and size of hypothesis tests with EGR’s method when data follow the stochastic process given in equation 3.1 Data were generated by specifying values of α, β, and σ The simulation used 1,000 replications and 120 simulated funds The mean
return over all funds, r¯ t, is derived from the values of α and β as:
where all sums are from i = 1 to n.
A constant value of α simulates no performance persistence For the data sets generated with persistence present, α was generated randomly based on the mean and variance of β’s in each of the three data sets To sim-ulate funds with the same leverage, the β’s were set to a value of 0.5 The simulation of funds with differing leverage (which provided heteroskedas-ticity) used β’s with values set to 0.5, 1.0, 1.5, and 2.0
To match EGR’s assumption of homoskedasticity, data sets were gener-ated with the standard deviation set at 2 Heteroskedasticity was cregener-ated by letting the values of σ be 5, 10, 15, and 20, with one-fourth of the observa-tions using each value This allowed us to compare the Spearman correlation coefficient calculated for data sets with and without homoskedasticity The funds were ranked in ascending order of returns for period one (first 12 months) and period two (last 12 months) From each 24-month period of generated returns, Spearman correlation coefficients were calcu-lated for a fund’s rank in both periods For the distribution of Spearman correlation coefficients to be suitably approximated by a normal, at least 10 observations are needed Because 120 pairs are used here, the normal approximation is used
Mean returns also were calculated for each fund in period one and period two, and then ranked The funds were divided into groups
n
t i
i
= +
−
Σ Σ Σ
α ε β 1
it
Trang 6ing of the top-third mean returns, middle-third mean returns, and bottom-third mean returns Two additional subgroups were analyzed, the top three highest mean returns funds and the bottom three funds with the lowest mean returns The means across all funds in the top-third group and bottom-third group also were calculated
To determine if EGR’s test has correct size, it is used with data where performance persistence does not exist (see Table 3.4) If the size is correct, the fail-to-reject probability should be 0.95 When heteroskedasticity is present (data generation methods 2 and 3), the probability of not rejecting
is less than 0.95 The heteroskedasticity may be more extreme in actual data, so the problem with real data may be even worse than the excess Type
I error found here
Next, we determine the power of EGR’s test by applying it to data where performance persistence really exists (see Table 3.5) The closer the fail-to-reject probability is to zero, the higher is the power The Spearman correlation coefficients show some ability to detect persistence when large
TABLE 3.4 EGR Performance Persistence Results from Monte Carlo Generated Data Sets: No Persistence Present by Restricting a = 1
Data Generation Method Generated Data Subgroups 1a 2b 3c
Mean returns
p-values
test of 2 means
aData generated using a = 1, b = 5; s = 2.
bData generated using a = 1, b = 5; s = 5, 10, 15, 20.
cData generated using a = 1, b = 5, 1, 1.5, 1; s = 5, 10, 15, 20.
Trang 7differences are found in CTA data But they show little ability to find per-sistence with the small differences in performance in the public fund data used by EGR The test of two means has even less ability to detect persist-ence Thus, the results clearly can explain EGR’s findings of no perform-ance persistence as being due to low power; Table 3.5 does show that EGR’s method can find performance persistence that is strong enough
HISTORICAL PERFORMANCE AS AN INDICATOR
OF LATER RETURNS
Results based on methods similar to those of EGR are now provided The previous Monte Carlo findings were based on a one-year selection period and a one-year performance period Given the low power of EGR’s method,
we use longer periods here: a four-year selection period with a one-year performance period, and a three-year selection period with a three-year
per-TABLE 3.5 EGR Performance Persistence Results from Monte Carlo Generated Data Sets: Persistence Present by Allowing a to Vary
Data Generation Method Generated Data Subgroups 1a 2b 3c 4d
Mean returns
p-values
test of 2 means
aData generated using a = N(1.099,4.99); b = 5, 1, 1.5, 2; s = 2.
bData generated using a = N(1.099,4.99); b = 5; s = 5, 10, 15, 20.
cData generated using a = N(1.099,4.99); b = 5, 1, 1.5, 2; s = 5, 10, 15, 20.
dData generated using a = N(1.099,1); b = 5, 1, 1.5, 2; s = 5, 10, 15, 20.
Trang 8formance period Equation (3.1) was estimated for the selection period and the performance period Because the returns are monthly, funds having fewer than 60 or 72 monthly observations respectively were deleted to avoid having unequal numbers of observations
The first year period evaluated was 1980 to 1984 The next five-year period was 1981 to 1985 Three methods are used to rank the funds: theα’s (intercept), the mean return, and the ratio α/σ For each parameter estimated from the regression, a Spearman rank-correlation coefficient was calculated between the performance measure in the selection period and the performance measure for the out-of-sample period The null hypothe-sis is of no correlation between ranks, and the test statistic has a standard normal distribution under the null Because of losing observations with missing values and use of the less efficient nonparametric method (rank-ing), this approach is expected to have less power than the direct regres-sion test in (3.1)
Table 3.6 presents a summary of the annual results Because of the overlap, the correlations from different time periods are not independent,
so some care is needed in interpreting the results All measures show some positive correlation, which indicates performance persistence Small corre-lations are consistent with the regression results Although there is per-formance persistence, it is difficult to find because of all the other random factors influencing returns
The return/risk measure (α/σ) clearly shows the most performance per-sistence This is consistent with McCarthy, Schneeweis, and Spurgin (1997), who found performance persistence in risk measures The rankings based
on mean returns and those based on α’s are similar Their correlations were similar in each year Therefore, there does not appear to be as much gain as expected in adjusting for the overall level of returns
The three-year selection period and three-year trading period show higher correlations than the four-year selection and one-year trading peri-ods except for the early years of public funds There were few funds in these early years and so their correlations may not be estimated very accurately Rankings in the three-year performance period are also less variable than in the one-year performance period The higher correlation with longer trad-ing period suggests that performance persistence continues for a long time This fact suggests that investors may want to be slow to change their allo-cations among managers
The next question is: Why do the results differ from past research? Actu-ally, EGR found similar performance persistence, but dismissed it as being small and statistically insignificant Our larger sample leads to more power-ful tests McCarthy (1995) did find performance persistence, but his results
Trang 9are questionable because his sample size was small McCarthy, Schneeweis, and Spurgin’s (1997) sample size was likely too small to detect performance persistence in the mean Irwin, Krukmeyer, and Zulauf (1992) placed funds into quintiles Their approach is difficult to interpret and may have led to low power Schwager (1996) found a similar correlation of 0.07 for mean
TABLE 3.6 Summary of Spearman Correlations between Selection
and Performance Periods
Data Set Selection Average Years Years Positive and Criterion Correlation Positive (%) Significant (%)
Four and onea
CTA
Public funds
Private funds
Three and Threeb
CTA
Public funds
Private funds
aCorrelation between a four-year selection period and a one-year performance period Averages are across the twelve one-year performance periods The same sta-tistic was used for the rankings in each period.
bThree-year selection period and three-year trading period.
Trang 10returns Schwager, however, found a negative correlation for his return/risk measure He ranked funds based on return/risk when returns were positive, but ranked on returns only when returns were negative This hybrid meas-ure may have caused the negative correlation Therefore, past literatmeas-ure is indeed consistent with a small amount of performance persistence Perfor-mance persistence is found here because of the larger sample size and a slight improvement in methods As shown in Table 3.6, several years yielded neg-ative correlations, and many positive correlations were statistically insignif-icant Therefore, results over short time periods will be erratic
The performance persistence could be due to either differences in trad-ing skills or differences in costs There is no strong difference in perform-ance persistence among CTAs, public funds, and private funds
PERFORMANCE PERSISTENCE
AND CTA CHARACTERISTICS
Because some performance persistence was found, we next try to explain why it exists Monthly percentage returns were regressed against CTA char-acteristics Only CTA data are used since little data on the characteristics of public and private funds were available
Data and Regression Model
Table 3.7 presents the means of the CTA characteristics The variables listed were included in the regression along with dummy variables Dummy vari-ables were defined for whether a long-term or medium-term trading system was used The only variables allowed to change over time were dollars under management and time in existence
The data as provided by LaPorte Asset Allocation had missing values recorded as zero If commissions, administrative fees, and incentive fees were all listed as zero, the observations for that CTA were deleted This eliminated most but not all of the missing values If commissions were zero, the mean of the remaining observations was imputed
A few times options or interbank percentages were entered only as a yes In these cases, the mean of the other observations using options or interbank was imputed When no value was included for non-U.S., options,
or interbank, these variables were given a value of zero Margins often were entered as a range In these cases, the midpoint of the range was used When only a maximum was listed, the maximum was used
If the trading horizon was listed as both short and medium term, the observation was classed as short term If both medium and long term or all