First, I grabbed some historical data for the S&P 500 from Yahoo finance using the get.hist.quote() function from the tseries library. I pulled down daily, weekly, and monthly data starting on 3 January 1950 and ending 31 Dec 2010 (start and end dates are for daily data). I then constructed S&P 500 returns by taking the first-difference of the logarithm of the S&P 500 Adjusted Closing Price. Here are (probably familiar) time series plots of the daily returns and a density plot...
Note that stock returns exhibit clustered volatility and are negatively skewed with significantly heavier tails than one would expect if returns were Gaussian. On a side note (related to my current research), a couple of years ago some researchers at the Santa Fe Institute (specifically Stefan Thurner, J. Farmer, and John Geanakoplos) published a paper titled "Leverage Causes Fat Tails and Clustered Volatility." Their model also predicts that returns should be negatively skewed (a point I think they should have included in the title).
Back to learning about splines! Is today's S&P 500 return useful in predicting tomorrow's S&P 500 return? For the null hypothesis, I take a strongish form of the Efficient Market Hypothesis (EMH):
- Ho: Stock prices follow a random walk with a drift
(i.e., returns should be mean zero white noise)
- HA,1: rt+1 = β0 + β1 rt + εt
- HA,2: Whatever the smoothing spline kicks out
Here is a scatter plot of tomorrow's return against today's return. I fit a simple linear regression to the data and plotted the curve in gray. While both the slope and intercept terms are very significant (p-values essentially zero for both), it is worth noting that the standard confidence intervals are not valid (much too narrow) given the blatant violation of Gauss-Markov assumptions for the regression. More work needs to be done before we can take this as evidence against the EMH null (since this post is about smoothing splines I am going to simply state that I would be surprised if, after calculating appropriate standard errors (either using bootstrapping, or some type of heteroskedastic robust standard errors, etc), the parameter results were still significant...but maybe!)
The smoothing spline is in orange. I used the smooth.spline() function in R to fit the spline (using leave-one-out cross-validation to pick the optimal penalty for the curvature).
If stock prices reflect all relevant information about the value of the stock, then one would expect that today's return should be pretty useless in predicting tomorrow's return (thus under the null the true regression line should be the dotted red line in the above scatter).
But what about the smoothing spline? A few things:
- While the regression line is positively sloped, the smoothing spline is negatively sloped for larger negative and large positive values of today's return.
- The asymmetry. The slope of the smoothing spline is more negative for large negative values of today's return (compared with the slope of the smoothing spline for large positive values of today's return).
- Outliers: The October 1987 stock market crash looms large in the data. How sensitive is the estimated smoothing spline to these 1-2 observations?
- Is the asymmetry of the smoothing spline a statistical artifact?
- Most importantly, despite the dramatic appearance, is the smoothing spline significantly different than the dotted red line?
For the most part, the dotted-red line lies entirely within the 95% confidence band for the smoothing spline (if you squint you can kind of see a small portion of the dotted-red line that lies outside the bands). So despite the dramatic appearance of the smoothing spline I would say that we can not statistically distinguish it from the dotted-red line.
I was curious to see what the above plot might look like if I used weekly and monthly S&P 500 returns instead of daily returns...
Weekly Returns:
Monthly Returns:
I was surprised at how different the daily, weekly, and monthly smoothing splines turned out to be...still in all three cases the 95% confidence bands for the smoothing spline contain (almost completely) the red-dotted line. I will have a think as to why they are so different, and perhaps follow up with another post. My R code will be posted as soon as I have time to get my Google Code page up and running...until then feel free to email me (or leave email in a comment) and I will send it to you.
Update: As pointed out in a comment below, EMH predicts stock returns should follow a random walk with a drift...which implies that the dotted-red line doesn't necessarily need to have a zero intercept. One would hope that the drift is slightly positive!
Nice pictures, However, maybe I misunderstood, but I think that simply resampling stock returns for bootstrap is not correct (you lose information from the ordering of observations). With time series data you should resample (estimated) residuals to "reconstruct" the series and do bootstrap from those.
ReplyDeleteAlso, I'm not sure that EMH requires zero mean returns (they should be positive, due to discounting, if nothing else), although that doesn't probably matter much here.
ivansml,
ReplyDeleteBoth of your notes are good. I have used non-parametric estimation before and constructed bootstrap confidence bands, but this is my first time using splines and my first time working with pure time-series data. It should not be too hard to change my code to re-sample estimated residuals. I will re-run the plots and see what changes.
Although I didn't mention it in the above post (probably should have) I used the dividend adjusted closing price to construct the returns. After adjusting for dividends, EMH implies that stock returns should follow a random walk with a drift (so returns don't necessarily need to have zero mean).