Beyond Microfoundations:: November 2011

Attention Conservation Notice:

I have a son!
Though plausible, the power-law is probably not the best model for the positive tails of gold returns; power-law model is firmly rejected for the negative tail.
Rest of post is the long, tired ramblings about my travails in finding confidence intervals for my power-law parameter estimates with dependent data.

Apologies for the long lag between posts, but my wife gave birth to our son Callan Harry on Sept. 19th and the joys of being a father took precedence over my blog posts! Now that things have settled down a bit, I thought I would take the time to write up some thoughts on the price of gold...

The above top plot in the above chart displays the daily nominal price of gold from 2 January 1973 through 28 November 2011using historical data from usagold.com. Note the sharp increase in the nominal price of gold following the creation of gold backed ETFs in March 2003. The bottom plot displays normalized standard logarithmic returns for gold over the same time period. Note the ever-present volatility clustering.

Normally when looking at an asset price over such a long time period one would prefer to look at the real price of the asset rather than the nominal price. As I am interested in exploring the effects of credit and liquidity constraints on asset prices, I want to focus on the dynamics of the nominal price of gold. Credit constraints typically enter the picture in the form of debt contracts which I assume are written in nominal rather than real terms. There are two ways to think about liquidity/re-saleability: quantity liquidity/re-saleability and value liquidity/re-saleability. A measure of quantity liquidity might be the average daily trading volume of an asset over some time period. A measure of value liquidity, might be the average $-value of an asset traded over some time period. I feel like the two concepts are not identical, and I prefer to think of liquidity in terms of value re-saleability rather than quantity re-saleability.

Show me the tails!
In thinking about how the dynamics of asset prices might be impacted by credit and liquidity constraints, it seems logical (to me at least) that one should focus on the behavior of the extreme tails of the return distribution. Below is a survival plot of both the positive and negative tails of normalized gold returns along with the best-fit power-law model found using the Clauset et al. (2009) method.

This approach to parameter estimation assumes that the underlying data (at least above the threshold parameter) are independent. How realistic is this assumption for gold returns (and for asset returns in general)? The random-walk model for asset prices, which is consistent with the strong form of EMH, would suggest that asset returns should be independent (otherwise historical asset price data could be profitably used to predict future asset prices). Given that gold returns, and asset returns generally, display clustered volatility one would expect that the absolute returns would display significant long-range correlations. Below are plots of various autocorrelation functions for gold...

Both the positive and negative tails of normalized gold returns display significant autocorrelations. MLE of scaling exponents assumes that gold returns, above the threshold parameter, are independent. My main concern is that the consistency of the MLE will be impacted by the dependency in the data. Perhaps the autocorrelations will go away if I restrict the sample to only observations above a given threshold?

Dependency in the positive tail of gold returns does appear to go away as I raise the threshold (it almost disappears completely above the optimal threshold). ACF plot for the negative tail is similar, although given that the optimal threshold is lower for the negative tail, the dependency is a bit more significant. Perhaps this argues for using an alternative to criterion to the KS distance in selecting the optimal threshold parameter. Perhaps Anderson-Darling? Anderson-Darling tends to be conservative in that it chooses a higher threshold parameter which might help deal with data dependency issues. But higher threshold means less data in the tail, which will make it more difficult to both reject a spurious power law and rule out alternative hypotheses. Life sure is full of trade-offs!

Given that the behavior of ACF is likely to vary considerably across assets, I would like to be able to make more general statements about the consistency of the MLE estimator in the presence of dependent data. Any thoughts?

Parameter Uncertainty...
Following the Clauset et al. (2009) methodology, I derive standard errors and confidence intervals for my parameter estimates using a non-parametric bootstrap. My first implementation of the non-parametric bootstrap simply generates synthetic data by re-sampling, with replacement, the positive and negative tails of gold returns, and then estimates the sampling distributions for the power-law model's parameters by fitting a power-law model to each synthetic data set.

To give you an idea of what the re-sampled data look like, here is a survival plot of the negative returns for gold along with all 1000 bootstrap re-samples used to derive the confidence intervals...

...and here is a plot of the bootstrap re-samples over-laid with the best-fit power law models:

The bootstrap replicates can be used to derive standard-errors for the scaling exponent and threshold parameters for the positive and negative tails, as well as various confidence intervals (percentile, basic, normal, student, etc) for the parameter estimates. Which of these confidence intervals is most appropriate depends, in part, on the shape of the sampling distribution. The simplest way to get a look at the shape of the sampling distributions is to plot kernel density estimates of the bootstrap replicates for the various parameters. Dotted black lines indicate the MLE parameter estimate.

WTF! Note that all of the sampling distributions have multiple local peaks! I suspect that the multi-peaked densities are due to sensitivity of the optimal threshold to the resampling procedure. Is this an example of non-parametric bootstrapping gone awry? The fact the the bootstrap estimates of the sampling distributions have multiple peaks suggests that standard confidence intervals (i.e., percentile, basic, normal, student, etc) are unlikely to provide accurate inferences.

The best alternative method (that I have come across so far) for calculating confidence intervals when the sampling distribution has multiple peaks is the Highest Density Regions (HDR) from Hyndman (1996). Again, the dotted black lines indicate the MLE parameter estimates.

The (significant!) differences between the estimated densities in the simple kernel density plots and the HDR plots is due to the use of different estimation methods. HDR uses a more sophisticated (more accurate?) density estimation algorithm based on a quantile algorithm from Hyndman (1996). Bandwidth for density estimation is selected using the algorithm from Samworth and Wand (2010). The green, red, and blue box-plots denote the 50, 95, and 99% confidence region(s) respectively. Note that sometimes the confidence region consists of disjoint intervals. But I still don't like that the densities are so messy. It makes me think that there might be something wrong in my resampling scheme. Can I come up with a better way to do the non-parametric bootstrap sampling for asset returns? I think so...

My second implementation of the non-parametric bootstrap attempts to deal with the possible dependency in the gold returns data by using a maximum entropy bootstrap to simulate the logarithm of the gold price series directly. I generate 1000 synthetic price series, and for each series I calculate the normalized returns, split the normalized returns into positive and negative tails, and then generate the sampling distributions for the power law parameters by fitting a power-law model to each tail separately.

Here are some plots of synthetic gold price data generated using the meboot() package:

I did not expect that the distribution of synthetic data would collapse as the number of bootstrap replicates increased. Was I wrong to have this expectation? Probably! Is this a law-of-large numbers/CLT type result? Seems like variance at time t of the bootstrap replicates of the gold price is collapsing towards zero while the mean at time t is converging to the value of the observed gold price. Clearly I need to better understand how the maximum entropy bootstrap works! Update: this behavior is the result of my using the default setting of force.clt=TRUE within the meboot() function. One can eliminate this behavior by setting force.clt=FALSE. Unfortunately, I have not been able to find a whole lot of information about the costs/benefits of setting force.clt=TRUE or force.clt=FALSE! The documentation for the meboot package is a bit sparse...

However, a plot of the negative tails of the normalized returns calculated from the simulated price series does look like I expected (which is good because this is the synthetic data to which I actually fit the power-law model!)...

The HDR confidence intervals based on the maximum entropy non-parametric bootstrap are much nicer! Mostly, I think, because the densities of the bootstrap replicates are much more well-behaved (particularly for the scaling exponents!).

Clearly I have more work to do here. I really don't feel like I have a deep understanding of either the maximum entropy bootstrap or the HDR confidence intervals. But I feel like both are an improvement over my initial implementation...

Goodness-of-fit for the Power-Law?
Using the KS goodness-of-fit test advocated in Clauset et al. (2009), I find that while the power-law model is plausible for the positive tail (p-value: 0.81 > 0.10), the power-law model can be rejected for the negative tail of normalized gold returns (p-value: 0.00 < 0.10).

The KS goodness-of-fit test generates synthetic data similar to the empirical data below the estimated power-law threshold, but that follow a true power-law (with MLE for scaling exponent) above the estimated threshold. This implementation destroys any underlying time dependencies in the data. Are these underlying time dependencies important for understanding the heavy-tailed behavior of asset returns? Previous research suggests yes. Suppose that time dependencies, in particular the slow-decay of volatility correlations, are key to understanding the heavy-tailed behavior of asset returns. Does the current implementation of the goodness-of-fit test under-estimate the goodness-of-fit for the power-law model?

Testing Alternative Hypotheses:
I end this ludicrously long blog post with a quick discussion of the results of the likelihood ratio tests used to determine if some alternative distribution fits the data better than a power-law.

Results are typical. Even though the power-law model is plausible for the positive tail of gold returns, the power-law can be rejected at the 10% level in favor of the power-law with an exponential cut-off. Also, other heavy-tailed alternatives, namely the log-normal and the stretched-exponential, can not be ruled out (even the thin-tailed exponential distribution can not be ruled out!). For the negative tail, the power-law is simply not plausible, and the power-law with cut-off is preferred based on log-likelihood criterion. All of these alternative distributions were fit to the data using maximum likelihood estimators derived under the assumption of independence of observations in the tail.

Code and data to reproduce the above results can be found here.

Many thanks to Aaron Clauset for his recent post on power-laws and terrorism, which was a major inspiration/motivation for my writing this post. Also, many thanks to Cosma Shalizi for continuing to provide excellent lecture notes on all things R.

Beyond Microfoundations:

Blog Topics...

Wednesday, November 30, 2011

Power-laws in Gold? A journey into the world of dependent data...