## Friday, March 16, 2012

### Equity returns: where power-laws go to die? Maybe...

A major criticism of previous empirical work assessing support for the power-law as a model for equity returns is that very few (any?) of the studies assess goodness-of-fit of the power-law model.  This post summarizes the results of goodness-of-fit testing for the power-law as a model for equity returns and is a continuation of my previous posts on the power-law as a model for equity returns using data on U.S. equities listed on the Russell 1000 index.

I implement two versions of the KS goodness-of-fit tests suggested in Clauset et al (SIAM, 2009):
• Parametric version: conceptually and computationally simple; assumes estimated xmin is the "true" power-law threshold and simulates data under the null hypothesis of a power-law with exponent equal to estimated α
• Non-parametric version: conceptually less straight-forward and computationally very intensive because it makes use of the fact that  xmin is estimated from the data.
Results:
Using the parametric version of the KS goodness-of-fit test I am able to reject the power-law model as plausible (i.e., p-value ≤ 0.10) for 17% of positive tails and 12% of negative tails.  I would not describe these results as overwhelmingly "anti-power-law," but I would remind readers that the parametric version of the goodness-of-fit test sets a lower-bound on support for the power-law (see my discussion of goodness-of-fit results for mutual funds for more details).  Here is a histogram of the p-values for the positive and negative tails of equities in my sample:
The results obtained using the more flexible non-parametric KS test are much less supportive of the power-law model.  I reject the power-law model as plausible for roughly 44% of positive tails and 37% of negative tails.  Here is another histogram of the goodness-of-fit p-values:
Clearly the non-parametric goodness-of-fit results are more damning for the power-law model than the parametric results.  But why?

Discrepancy could be due to sample size effects.  The average number of tail observations for a given stock in my sample is 313 (positive tail) and 333 (negative tail): using daily data, I simply do not have that many tail observations to work with.  Because of the small sample size I need to rely the extra flexibility of the non-parametric version of the goodness-of-fit to be able to reject the power-law model as plausible.  I suspect that this would not be the case if I had access to the TAQ data set used by Gabaix et al (Nature, 2003).  It is also worth noting that the more data I get, the more likely I am to reject the power-law as plausible.  Specifically, for those equities for which I reject the power-law model as plausible, either for positive or negative tail (or both), I have a larger average number of both total observations and observations in the tail.  This sample size dynamic has been something I have observed fairly consistently whilst fitting and testing power-law models to various data sets.  The more data I obtain, the less support I find for a power-law and the more support I tend to find for heavy-tailed alternatives (particularly the log-normal).

While I interpret the above results of goodness-of-fit tests as being decidedly against the hypothesis that the power-law model is "universal," clearly the power-law is a plausible model for either the positive or negative (or both) tails of returns for some stocks.  In my mind this immediately suggests that there might be meaningful heterogeneity in the tail behavior of asset returns and that an interesting research direction might be to explore economic mechanisms that could generate such diversity of tail behavior.

I end with a significant disclaimer.  I am concerned that by ignoring the underlying time dependence (think "clustered volatility" and mean-reversion) in large returns that my goodness-of-fit test results might be biased against the power-law model.  Suppose that Gabaix et al (Nature, 2003) are correct about equity returns having power-law tails.  Given the dependence structure of returns, it could be the case that the typical KS distance between the best-fit power-law model and the "true" power-law is larger than what I estimate in implementing an iid goodness-of-fit test.

If this is the case, then perhaps the reason I reject the power-law model so often is because my observed KS distance is obtained from fitting a power-law model to dependent data, whereas my bootstrap KS distances are obtained by fitting a power-law to synthetic data that follows a "true" power-law but ignores the underlying dependence structure of returns.  In order to address this issue, I will need to develop an alternative goodness-of-fit testing procedure that can mimic the time dependence in the returns data!  Fortunately, I have some ideas (good ones I hope!) on how to proceed...

## Monday, March 12, 2012

### Zipf's Law does not hold for mutual funds!

Gabaix et al (Nature, 2003) and Gabaix et al (QJE, 2006) lay out an economic theory of large fluctuations in share prices based, in part, on the assumption that the size (as measured in dollars of assets under management) of investors in asset markets is well approximated by Zipf's law (i..e., a power-law with scaling exponent ζ ≈ 1 or α ≈ 2).  Zipf's law has been purported to hold for cities ( Zipf (Addison-Wesley, 1949), Gabaix (QJE, 1999), Gabaix (AER, 1999), Gabaix and Ioannides (2004), Gabaix (AER, 2011), etc), firms (Okuyama et al (Physica A, 1999), Axtell (Science, 2001), Fujiwara et al (Physica A, 2004)), banks (Aref and Pushkin (2004)), and mutual funds (Gabaix et al (QJE, 2006)).   I say purported, because experience has taught me never to believe in a power-law that I haven't estimated myself!

In this post, I am going to provide evidence against the power-law as an appropriate model for mutual-funds using the data from the same source as Gabaix et al (QJE, 2006).  The figure below shows two survival plots of the size, as measured in terms of $-value of assets under management, of U.S. mutual funds at the end of 2009 using data from CRSP.1 The top panel shows the entire data set, the second panel shows only upper 20% of mutual funds (roughly those funds with assets under management greater than$1 billion) and is intended to match as closely as possible Figure VII from Gabaix et al (QJE, 2006).
Choosing a threshold to include only the largest 20% of mutual funds for a given year, Gabaix et al (QJE, 2006) report an average estimate for the power-law scaling exponent of ζ ≈ 1 (or α ≈ 2) over the period 1961-1999.   Gabaix et al (QJE, 2006) estimate α using OLS on the upper CDF of the mutual fund distribution (although they report similar results using the Hill estimator).

Using my larger data set I estimate, via OLS and choosing the same 20% cut-off criterion (which leaves 1313 observations in the tail), a scaling exponent of ζ = 1.11 (or α ≈ 2.11). Here is a plot showing my OLS estimates:

I estimated the scaling exponent using maximum likelihood in two ways.  First, I apply the Hill estimator to the data using the same 20% cut-off as in Gabaix et al (QJE, 2006); second, I re-estimate the scaling exponent using the Hill estimator, while choosing the threshold parameter to minimize the KS distance as in Clauset et al (SIAM, 2009).  Method 1 obtains an estimate of α = 1.97(3); while method 2 obtains estimates of α = 2.04(3) and xmin = $1.12 billion (which leaves 1077 observations in the tail). Note that the KS distance, D, for each maximum likelihood fits is smaller than the KS distance obtained using the OLS estimate of α. Numbers in parentheses show the amount of uncertainty in the final digit (obtained using a parametric bootstrap to estimate the standard error). Parameter uncertainty is estimated using the bootstrap: • Using 20% cut-off and a parametric bootstrap, I estimate a se for α of 0.026 and a corresponding 95% confidence interval of (1.912, 2.013) • Choosing xmin via Clauset et al (SIAM, 2009) and using a parametric bootstrap, I estimate a se for α of 0.032 and a corresponding 95% confidence interval of (1.976, 2.098) • Finally, choosing xmin via Clauset et al (SIAM, 2009) and using a non-parametric bootstrap, I estimate a se for α of 0.059 and a corresponding 95% confidence interval of (1.932, 2.113); se for xmin of$0.530 B and a corresponding 95% confidence interval of (\$0.398 B, 1.332 B)
Note that in all three cases, the 95% confidence interval for the estimated scaling exponent includes α=2 (i.e., Zipf's "law").  So far so good for Gabaix et al (QJE, 2006).
However, what about goodness-of-fit? Good data analysis is a lot like good detective work, and it is important to collect as much evidence as possible, relevant to testing the hypothesis at hand, before passing judgement.  As stressed in Clauset et al (SIAM, 2009), an assessment of the goodness-of-fit of the power-law model is an important piece of relevant statistical evidence.  Here are my goodness-of-fit test results:
• Using a 20% cut-off as suggested in Gabaix et al (QJE, 2006) along with the parametric version of the KS goodness-of-fit test I obtain a p-value of roughly 0.00 using 2500 repetitions, which suggests that the power-law model is not plausible.
• Choosing xmin via  Clauset et al (SIAM, 2009) and using the parametric version of the KS goodness-of-fit test I obtain a p-value of roughly 0.19 using 2500 repetitions, which suggests that the power-law model is plausible.
• Finally, choosing xmin via  Clauset et al (SIAM, 2009) and using the non-parametric bootstrap version of the KS goodness-of-fit test I obtain a p-value of roughly 0.02 using 2500 repetitions, which again suggests that the power-law model is not plausible.
On the whole, I think these results are not very supportive of the power-law model.  Even though the power-law model remains plausible when  I choose xmin via Clauset et al (SIAM, 2009) and assess goodness-of-fit using the parametric version of the KS test, it is important to note that such an assessment is not properly taking into account the flexibility of the Clauset et al (SIAM, 2009) procedure in choosing the threshold parameter (along with estimating the scaling exponent).2  Once I take this the additional flexibility into account (i.e., by using the non-parametric KS test), I again find that the power-law model is not plausible!  Here is a nice set of density plots of the bootstrap KS distances from each version of the goodness-of-fit test, that illustrates the differences between the parametric and non-parametric procedures (I hope!):
Note that implementing the non-parametric version of the KS goodness-of-fit test basically shifts and "condenses" the sampling distribution of the KS distance (relative to both parametric versions).  Taking into account the additional flexibility of the Clauset et al (SIAM, 2009) procedure for fitting the power-law null model reduces both the mean and variance of sampling distribution of the KS distance, D.

Quick test of alternative hypotheses.  A very plausible alternative distribution for mutual funds is the log-normal (recall Gibrat's law of proportionate growth would predict log-normal).  Can I reject the power-law in favour of the log-normal using likelihood ratio tests?  YES!
• Using a 20% cut-off as suggested in Gabaix et al (QJE, 2006) the Vuong LR test statistic is -3.63 with a two-sided p-value of roughly 0.00 (which implies that, given the data, I can distinguish between the power-law and log-normal) and a one-sided p-value of roughly 0.00 (implying that I can reject the power-law in favour of the log-normal!
• Choosing xmin via  Clauset et al (SIAM, 2009) the Vuong LR test statistic is -2.27 with a two-sided p-value of roughly 0.023 (which implies that, given the data, I can distinguish between the power-law and log-normal) and a one-sided p-value of roughly 0.012 (implying that I can reject the power-law in favour of the log-normal!
What are the economic implications or all of this?  Does it matter whether or not mutual fund size is distributed according to a log-normal or power-law distribution?

I think it matters quite a bit for the model put forward in Gabaix et al (QJE, 2006)! In Gabaix et al (QJE, 2006) investors take as given that the distribution of investors' size follows a power-law  Specifically, an investor makes use of the distribution of investor size in calculating his optimal trading volume.   Gabaix et al (QJE, 2006) relies on the power-law being a "good approximation" to the true distribution of investor size in order to justify investors taking a power-law distribution as given.  I have provided evidence that the power-law is not a plausible model, and that a log-normal distribution is a significantly better fit.  If the true distribution is not a power-law, then agents in Gabaix et al (QJE, 2006) are effectively solving a mis-specified optimization program and there is no longer any guarantee that the solution to the properly specified optimization program will result in power-law tails for equity and volume (paradoxically, however, this might turn out to be "good" for Gabaix et al (QJE, 2006) in the sense that I have argued in previous posts that the tails of equity returns are not power-law anyway!).

However, whether or not it matters if a distribution is log-normal, power-law, or simply "heavy-tailed" depends on context.  In this case a log-normal distribution is consistent with Gibrat's law of proportionate growth.  Gibrat's law applied to investor size says that if the growth rate of investors' assets under management is independent of the amount of assets currently under management, then the distribution of investor size will follow a log-normal distribution.  One could easily test whether or not the growth rate of mutual funds is independent of size. Maybe someone already has?
Personally, I think the important takeaway from the above analysis is just that there is quite extreme heterogeneity in the size of investors (although not extreme enough to justify a power-law)!  In other words, the distribution of investor sizes is generically "heavy-tailed."  Investors are not necessarily small relative to the "market" which suggests that at least some investors are unable to take prices parametrically (i.e., as given) when determining their optimal trading behavior.  In this respect I wholeheartedly agree with Gabaix et al (QJE, 2006): investor size does play a significant role in determining dynamics of asset prices.  These results also suggest an alternative way to think about the liquidity of an asset.  An asset might be very liquid (i.e., re-saleable) for one investor, but might be very illiquid for another because the desired volume of trade is different!  Liquidity might not simply be an inherent property of the asset itself, but may also depend on the "size" of the investor holding it!

1 Gabaix et al (QJE, 2006) use data on mutual fund assets from 4th quarter of 1999, whereas I use the larger and more recent data set from 4th quarter of 2009.
2 Assessing goodness-of-fit using a parametric version of the KS goodness-of-fit test that takes the optimal threshold chosen using the Clauset et al (SIAM, 2009) method as given is both conceptually easier to understand, and computationally simpler to implement.  This procedure also sets an effective lower bar for the plausibility of the power-law model: if the power-law model is not plausible using this parametric KS goodness-of-fit test, then it will be even less plausible if I use the more flexible (and more rigorous) non-parametric KS goodness-of-fit test.

## Wednesday, March 7, 2012

A recent flurry of blog posts from Noah SmithSimon Wren-Lewis, Paul Krugman, and others related to microfoundations and their relevance/usefulness for macro encouraged me to write my a post summarizing my own thoughts. Update: Paul Krugman has written another gem on the topic. Either I agree with him or he agrees with me...I can't decide which!

I think that choosing whether to use a macro model base solely on relationships between aggregate variables or a macro model with microfoundations basically boils down to balancing a kind of bias-variance trade-off.  As Paul Krugman notes, all microfoundations are biased representation of "true" individual behavior:
And when making such comparisons between economics and physical science, there’s yet another point: what we call “microfoundations” are not like physical laws. Heck, they’re not even true. Maximizing consumers are just a metaphor, possibly useful in making sense of behavior, but possibly not. The metaphors we use for microfoundations have no claim to be regarded as representing a higher order of truth than the ad hoc aggregate metaphors we use in IS-LM or whatever; in fact, we have much more supportive evidence for Keynesian macro than we do for standard micro.
Given that we don't know what the "true" microfoundations are (or that they even exist?), and given that all microfoundations (whether based on rational expectations and optimization, or insights from behavioral economics) are at best approximations of the "true" microfoundations, the inclusion of any microfoundations into our (already biased) macro models adds an additional source of approximation error to the model which should negatively impact the model's predictive ability.1

However, microfoundations also typically discipline a model by forcing it to satisfy optimality conditions or other behavioral constraints that should reduce the variance of the model's predictions.  Thus it could be the case that:
1. introducing biased microfoundations into the model achieves a reduction in the variance of our model's predictions that more than compensates for the added bias, or
2. introducing biased microfoundations does not achieve a reduction in the variance of our model's prediction that compensates for the added bias.
In case 1, the inclusion of biased microfoundations improves the overall predictive capability of our macro model; while is case 2 the microfoundations makes the model worse (at least in terms of predictive ability!).  I see no reason why case 1 should always turn out to be true...

It is in this sense that I disagree with Noah's argument that microfoundations probably lead to better models:
A better reason to use microfoundations, in my opinion, is that they probably lead to better models. "Better," of course, means "more useful for predicting the future." If our models predict future aggregate macro variables (GDP, etc.) based solely on the past values of those variables, we'll almost certainly be using less information than is available; if we figure out how economic actors are making their decisions, we will have a lot more information. More information = better model. And there are all kinds of ways to observe and model individual behavior - survey data, lab experiments, etc.
I am willing to concede that models predicting future macro variables based solely on historical data uses less "information," than say DSGE models with all the attendant restrictions on individual behavior, but I disagree that using more "information" necessarily implies that the model's predictions are superior.  If we knew the "true" microfoundations, then including them in the model would  unambiguously improve the model's prediction.  However, if our microfoundations are doomed to be at best an approximation of the "truth," then including them in our model will not automatically improve the model's predictive ability.  Reading Noah's post in its entirety makes me think that he is referring to models using "correct" microfoundations in the above quote.

1 Although I suppose that it is possible for the bias introduced by including microfoundations to "offset" some of the preexisting bias in the macro model.

## Thursday, March 1, 2012

### Santa Fe bound!

I have just received notification that I have been accepted into the 2012 Complex Systems Summer School at the Santa Fe Institute!  Attending the Santa Fe institute's summer school has been a dream of mine ever since I was made aware of its existence some 6 years ago...