I implement two versions of the KS goodness-of-fit tests suggested in Clauset et al (

*SIAM*, 2009):

- Parametric version: conceptually and computationally simple; assumes estimated x
_{min}is the "true" power-law threshold and simulates data under the null hypothesis of a power-law with exponent equal to estimated α - Non-parametric version: conceptually less straight-forward and computationally
*very*intensive because it makes use of the fact that x_{min}is estimated from the data.

**Results:**

Using the parametric version of the KS goodness-of-fit test I am able to reject the power-law model as plausible (i.e.,

*p*-value ≤ 0.10) for 17% of positive tails and 12% of negative tails. I would not describe these results as overwhelmingly "anti-power-law," but I would remind readers that the parametric version of the goodness-of-fit test sets a lower-bound on support for the power-law (see my discussion of goodness-of-fit results for mutual funds for more details). Here is a histogram of the

*p-*values for the positive and negative tails of equities in my sample:

The results obtained using the more flexible non-parametric KS test are much less supportive of the power-law model. I reject the power-law model as plausible for roughly 44% of positive tails and 37% of negative tails. Here is another histogram of the goodness-of-fit

*p*-values:

Clearly the non-parametric goodness-of-fit results are more damning for the power-law model than the parametric results. But why?

Discrepancy could be due to sample size effects. The average number of tail observations for a given stock in my sample is 313 (positive tail) and 333 (negative tail): using daily data, I simply do not have that many tail observations to work with. Because of the small sample size I need to rely the extra flexibility of the non-parametric version of the goodness-of-fit to be able to reject the power-law model as plausible. I suspect that this would not be the case if I had access to the TAQ data set used by Gabaix et al (

*Nature*, 2003). It is also worth noting that the more data I get, the more likely I am to reject the power-law as plausible. Specifically, for those equities for which I reject the power-law model as plausible, either for positive or negative tail (or both), I have a larger average number of both total observations and observations in the tail. This sample size dynamic has been something I have observed fairly consistently whilst fitting and testing power-law models to various data sets. The more data I obtain, the less support I find for a power-law and the more support I tend to find for heavy-tailed alternatives (particularly the log-normal).

While I interpret the above results of goodness-of-fit tests as being decidedly against the hypothesis that the power-law model is "universal," clearly the power-law is a plausible model for either the positive or negative (or both) tails of returns for some stocks. In my mind this immediately suggests that there might be meaningful heterogeneity in the tail behavior of asset returns and that an interesting research direction might be to explore economic mechanisms that could generate such diversity of tail behavior.

I end with a significant disclaimer. I am concerned that by ignoring the underlying time dependence (think "clustered volatility" and mean-reversion) in large returns that my goodness-of-fit test results might be biased against the power-law model. Suppose that Gabaix et al (

*Nature*, 2003) are correct about equity returns having power-law tails. Given the dependence structure of returns, it could be the case that the typical KS distance between the best-fit power-law model and the "true" power-law is larger than what I estimate in implementing an iid goodness-of-fit test.

If this is the case, then perhaps the reason I reject the power-law model so often is because my observed KS distance is obtained from fitting a power-law model to dependent data, whereas my bootstrap KS distances are obtained by fitting a power-law to synthetic data that follows a "true" power-law but ignores the underlying dependence structure of returns. In order to address this issue, I will need to develop an alternative goodness-of-fit testing procedure that can mimic the time dependence in the returns data! Fortunately, I have some ideas (good ones I hope!) on how to proceed...