## Wednesday, June 8, 2011

### More on Smoothing Splines...

Chalk yesterday's post up as a learning experience...following a very helpful comment left on yesterday's blog post, I made two changes to my code:
1. I am now calculate the confidence bands by bootstrapping the residuals (instead of the data points themselves).  The confidence bands yesterday were huge.  The reason why the confidence bands were so wide was because the bootstrap method I implemented scrambled the dataset during each iteration, which destroyed the time series nature of the underlying data.  Bootstrapping over the smoothing spline's residuals should preserve the times series nature of the data (which should narrow the confidence bands considerably).
2. I am now using generalized cross validation (GCV) to pick the penalty for the curvature( for those following at home that is setting cv=FALSE using the function smooth.spline().
Here are the relevant plots using daily, weekly, and monthly returns:

Daily Returns:
Weekly Returns:
Monthly Returns:
Both the daily and monthly plots exhibit significant asymmetry between the left and right tails of the data.  Although I still wonder about the sensitivity of the results to certain outliers (particularly with the daily data).  Recall the EMH implies that returns should follow random walk with a drift, and the red-dotted line represents the EMH null prediction (ignoring the drift...feel free to mentally shift the dotted-red line up or down as you see fit).

My (not very enlightening) interpretation of the above plots is that EMH works pretty well for the "body" of the data (all three plots are roughly "flat" where most of the data lies), but there is something fundamentally different governing the dynamics for large returns (and that whatever is governing the dynamics must effect large negative returns differently than large positive returns).  Note that the robust asymmetry is much harder (maybe impossible) to detect using a linear model.

1. Yes, you really can't just re-sample individual data points for time series, because it destroys the dependencies you are trying to capture. Re-sampling residuals is one reasonable approach. Another is to re-sample whole blocks of observations, producing a surrogate time series with the right dependence within each block (and hopefully close to the right dependencies over-all, if correlations decay quickly). There is a bias/variance trade-off which controls the optimal block length; theory says it should grow like the cube root of the duration of the time series, but then you get into finding the constant. (See Lahiri for gory details.) My recollection was that a block length of 4 days was optimal for the daily returns, when I did that example.

Robustness: You could try more robust fitting methods (like using mean absolute error, rather than mean-squared error, in the spline). Or just removing those particular data points. But I'd be very leery of doing that; those days really happened. (To steal one of D^2's lines, the Great Depression was not an unusually noisy observation of an underlying 3% trend growth rate.)

2. Cosma,

Thanks for the excellent comment (and more importantly thanks for putting your lecture notes online). I am really enjoying learning about non-parametric estimation....didn't get a lot of it in my courses on econometrics!

3. I looked at prof. Shalizi's notes to see what these splines are all about, and the idea seems to be same as in Hodrick-Prescott filter used in macroeconomics, which is interesting... and also bit reassuring (HP filter looked really ad-hoc when I first saw it). Then one starts to wonder, are there other cases where economists have reinvented the wheel?

About EMH, I'm no expert and this is probably outside the scope of your blogpost, but in general testing EMH means testing jointly efficiency / rational expectations and a particular asset pricing model (since you can have model where expected returns vary over time in response to macroeconomic factors). So maybe there exists a model which would explain such an asymetric dynamics without violating EMH... and maybe not. It's certainly an interesting question!

4. Ivan,

The joint hypothesis testing problem you mention doesn't actually matter here - what's being tested is the prediction of weak-form EMH that prices should follow a random walk. The joint hypothesis problem matters when one tries to test whether an artificial portfolio can "beat the market" - so the researcher contructs a portfolio and compares its return over some period to the market return. The problem arises when you correct the accounting return on the portfolio for risk carried; this involves the use of an asset-pricing model, and hence the problem.

The essence of EMH is the one can't beat the market using publicly available information - so the interesting question with David's result (to paraphrase your question) is why, given that returns appear to be asymmetric, can we not make an abnormal return using this information? It is interesting!

5. I would think that the Farmer, Geanakoplos, Thurner (2009) paper that I linked to in my previous post on smoothing splines (i.e., the one where I botched the confidence bands!) could be extended to create an asset pricing model that could capture the asymmetries in the returns.

Glad to see that the post has generated some interest...