Attention conservation notice: Stock returns are non-stationary! Logistic regression, smoothing splines, generalized additive models are all interesting techniques and fun to play with...but stock returns are still non-stationary!Suppose instead of trying to predict tomorrow's stock return based on today's return, I just try to predict whether or not tomorrow's return will be positive. One way to do this would be using logistic regression. The response variable will be an indicator variable that takes a value of 1 if tomorrow's return is positive, and 0 otherwise. The name of the game is to model the probability of tomorrow's return being positive, conditional on the value of today's return.
Here is the R summary output for a logistic model of the S&P 500 data from Yahoo!Finance:
Call:
glm(formula = (tomorrow > 0) ~ today, family = binomial, data = SP500.data.v2)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.647 -1.223 1.080 1.129 2.007
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1106 0.0161 6.866 6.60e-12 ***
today 8.6524 1.6788 5.154 2.55e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 21458 on 15513 degrees of freedom Residual deviance: 21431 on 15512 degrees of freedom AIC: 21435
Number of Fisher Scoring iterations: 3
Both the intercept and the coefficient on today's return are highly significant. How do we interpret the coefficients? If today's S&P 500 return was 0.0, then the intercept represents the logistic model's prediction for the likelihood that tomorrow's S&P 500 return is positive. However to get the predicted probability I need to transform the coefficient from the logistic scale back to the probability scale:
exp( 0.1106) / (1 + exp( 0.1106)) = 0.527611 or roughly 53%
The logistic model predicts there is a 53% chance of tomorrow's return being positive given that today's return was zero (slightly better than a coin flip!). Suppose that the S&P 500 was down 10% today (i.e., today's return is -0.10)? The predicted probability of tomorrow's return being positive is:
exp( 0.1106 + 8.6524(-0.10)) / (exp( 0.1106 + 8.6524(-0.10))) =
0.3198024 or roughly 32%
I plot the predicted probabilities (and their confidence bands) against today's return using the logistic model for the daily returns of the S&P 500 from January 1950 through this past Friday. The predicted probabilities increase monotonically as today's return goes from negative to positive (i.e., the lowest probability of a positive return tomorrow follows large negative returns today, and the highest probability of a positive return tomorrow follows a large positive return today. The results of the logistic model are, I think, inconsistent with mean-reverting returns. I would think mean reverting returns would require a higher likelihood of a positive return tomorrow if today's return where large and negative. Note that the confidence bands are mildly asymmetric, and that they narrow considerably where the bulk of the observed returns lie.
Similar plots for the FTSE-100 from 1984-2011, and the Straits Times Index (STI) from 1987-2011. The probability of tomorrow's FTSE-100 being positive given today's return is basically a coin-flip much of the time!
The predicted probabilities from the logistic model using data from the Straits Times Index (STI) in Singapore are more similar to the predicted probabilities for the S&P 500.
I would like to have some sense of whether the above logistic models are well-specified. How might I go about validating the above logistic regressions? One way would be to estimate and compare the fits of some more flexible models (i.e., smoothing splines, generalized additive models, etc). If the logistic regression is well specified, then I would expect that the more flexible models should not give significantly different predictions than the above logistic model. Here is the plot of the predicted probabilities and the 95% confidence bands of the logistic model for the S&P 500, a smoothing spline (blue), and a generalized additive model (red).
WTF! The smoothing spline and the GAM have very similar predicted probabilities, particularly when today's return falls within [-0.05, 0.05] (i.e., over the bulk of the observed returns). However, visually, neither the smoothing spline nor the GAM appear to match the logistic regression well at all!
The plot is actually a bit misleading, because your eye is immediately drawn to the huge differences in predicted probabilities for the extreme tails of today's return. However, there are only a handful of observations in the extreme positive and negative tails of today's return (check the data rug) and thus the predicted probabilities for the smoothing spline and GAM models are unlikely to be very precise in these regions. Better to focus on the differences in the predicted probabilities between the spline/GAM models and the logistic model in the region around today's return equals zero (where the bulk of the data is located). In this neighborhood, the curve of predicted probabilities for all three models is increasing (consistent, I think, with some type of trend-following dynamic). Note, however, that for the smoothing spline and GAM models, the slope of the predicted probability curve is much steeper (suggesting a more aggressive trend-following dynamic?) compared to the logistic model.
Here are the plots for the FTSE-100 and the Straits Times Index (STI)...
The whole point of fitting the smoothing spline and the GAM was to determine whether or not the logistic model is well specified. As mentioned above, if the logistic model is well specified, then spline/GAM models should not be significantly better fits to the data. We can measure goodness of fits by comparing the deviance of the logistic model with that of the GAM (I am going to ignore the smoothing spline because it doesn't technically respect the probability scale). Thus the observed difference in deviance between the logistic model and the GAM for the S&P 500 is the deviance for the null model (i.e., Logistic model) less the deviance of the alternative (GAM model).
21430.65 - 21348.16 = 82.49624
The observed difference in deviance for the FTSE-100 and the STI are 16.53285 and 12.30237, respectively. A smaller deviance indicates a better fit, thus, as expected, the more flexible GAM is a better fit for the S&P 500, the FTSE-100, and the STI. Are these observed differences in deviance significant? Let's go to the bootstrap! Basic idea is to use the bootstrap to generate the sampling distribution for the difference in deviance between the null and alternative models, and then see how often these bootstrap replicates of the difference in deviance exceed the observed difference in deviance. If the fraction of times that the replicated difference in deviance exceeds the observed difference is high, then it is likely that the observed improvement in fit using the GAM is the result of statistical fluctuations and is therefore not significant.
Results? Running the above bootstrap using 1000 replications yields p-values of 0.00, 0.02, and 0.06 for the S&P 500, FTSE-100, and STI, respectively. The p-values indicate that the improvement in fit using the GAM model on data from the S&P 500 and FTSE-100 is significant at the 5% level, and that the improvement in fit using the GAM model is significant for the STI data at the 10% level.
What does all of this mean? Well, as far as this little modeling exercise is concerning, the results of the bootstrap specification test suggest that we should use a GAM instead of a logistic model. Here are the plots of the GAM predicted probabilities with confidence bands for the S&P 500, the FTSE-100 and the STI. Note how wide the 95% confidence bands for the predicted probability of tomorrow's return being positive are when today's return is either really negative or really positive. This is exactly as it should be! There just aren't enough observed extreme returns (positive or negative) to support precise predictions.
Could this be used to construct a useful investment stratagem? I think doubtful. Compare the GAM model's predictions for the S&P 500 above, which make use of historical data from 1950-2011, to the GAM model's predictions using data from 1993-2011 for SPY, the widely traded ETF which tracks the S&P 500 (I assume that implementing a stratagem would involve trading some ETF like SPY). I have also included the logistic regression and its 95% confidence bands because the bootstrap specification test fails to reject the logistic model in favor of the GAM (p-value of 0.50).
The two are substantially different. The interesting little window of trend-following behavior is now gone. Perhaps it was a historical artifact from pre-computer trading days of yore? The negative slope of the predicted probabilities is consistent with mean-reversion in returns.
The underlying problem with trying to predict the behavior of stock returns, is that they are non-stationary. It's not just that the parameters of the "data generating process" for stock returns are changing over time, the entire data generating process itself is evolving over time. When the underlying process itself is changing, getting more historical data is unlikely to help (and in fact is likely to make predictions substantially worse!)...
Update: Code is now available!