## Monday, March 12, 2012

### Zipf's Law does not hold for mutual funds!

Gabaix et al (Nature, 2003) and Gabaix et al (QJE, 2006) lay out an economic theory of large fluctuations in share prices based, in part, on the assumption that the size (as measured in dollars of assets under management) of investors in asset markets is well approximated by Zipf's law (i..e., a power-law with scaling exponent ζ ≈ 1 or α ≈ 2).  Zipf's law has been purported to hold for cities ( Zipf (Addison-Wesley, 1949), Gabaix (QJE, 1999), Gabaix (AER, 1999), Gabaix and Ioannides (2004), Gabaix (AER, 2011), etc), firms (Okuyama et al (Physica A, 1999), Axtell (Science, 2001), Fujiwara et al (Physica A, 2004)), banks (Aref and Pushkin (2004)), and mutual funds (Gabaix et al (QJE, 2006)).   I say purported, because experience has taught me never to believe in a power-law that I haven't estimated myself!

In this post, I am going to provide evidence against the power-law as an appropriate model for mutual-funds using the data from the same source as Gabaix et al (QJE, 2006).  The figure below shows two survival plots of the size, as measured in terms of $-value of assets under management, of U.S. mutual funds at the end of 2009 using data from CRSP.1 The top panel shows the entire data set, the second panel shows only upper 20% of mutual funds (roughly those funds with assets under management greater than$1 billion) and is intended to match as closely as possible Figure VII from Gabaix et al (QJE, 2006).
Choosing a threshold to include only the largest 20% of mutual funds for a given year, Gabaix et al (QJE, 2006) report an average estimate for the power-law scaling exponent of ζ ≈ 1 (or α ≈ 2) over the period 1961-1999.   Gabaix et al (QJE, 2006) estimate α using OLS on the upper CDF of the mutual fund distribution (although they report similar results using the Hill estimator).

Using my larger data set I estimate, via OLS and choosing the same 20% cut-off criterion (which leaves 1313 observations in the tail), a scaling exponent of ζ = 1.11 (or α ≈ 2.11). Here is a plot showing my OLS estimates:

I estimated the scaling exponent using maximum likelihood in two ways.  First, I apply the Hill estimator to the data using the same 20% cut-off as in Gabaix et al (QJE, 2006); second, I re-estimate the scaling exponent using the Hill estimator, while choosing the threshold parameter to minimize the KS distance as in Clauset et al (SIAM, 2009).  Method 1 obtains an estimate of α = 1.97(3); while method 2 obtains estimates of α = 2.04(3) and xmin = $1.12 billion (which leaves 1077 observations in the tail). Note that the KS distance, D, for each maximum likelihood fits is smaller than the KS distance obtained using the OLS estimate of α. Numbers in parentheses show the amount of uncertainty in the final digit (obtained using a parametric bootstrap to estimate the standard error). Parameter uncertainty is estimated using the bootstrap: • Using 20% cut-off and a parametric bootstrap, I estimate a se for α of 0.026 and a corresponding 95% confidence interval of (1.912, 2.013) • Choosing xmin via Clauset et al (SIAM, 2009) and using a parametric bootstrap, I estimate a se for α of 0.032 and a corresponding 95% confidence interval of (1.976, 2.098) • Finally, choosing xmin via Clauset et al (SIAM, 2009) and using a non-parametric bootstrap, I estimate a se for α of 0.059 and a corresponding 95% confidence interval of (1.932, 2.113); se for xmin of$0.530 B and a corresponding 95% confidence interval of (\$0.398 B, 1.332 B)
Note that in all three cases, the 95% confidence interval for the estimated scaling exponent includes α=2 (i.e., Zipf's "law").  So far so good for Gabaix et al (QJE, 2006).
However, what about goodness-of-fit? Good data analysis is a lot like good detective work, and it is important to collect as much evidence as possible, relevant to testing the hypothesis at hand, before passing judgement.  As stressed in Clauset et al (SIAM, 2009), an assessment of the goodness-of-fit of the power-law model is an important piece of relevant statistical evidence.  Here are my goodness-of-fit test results:
• Using a 20% cut-off as suggested in Gabaix et al (QJE, 2006) along with the parametric version of the KS goodness-of-fit test I obtain a p-value of roughly 0.00 using 2500 repetitions, which suggests that the power-law model is not plausible.
• Choosing xmin via  Clauset et al (SIAM, 2009) and using the parametric version of the KS goodness-of-fit test I obtain a p-value of roughly 0.19 using 2500 repetitions, which suggests that the power-law model is plausible.
• Finally, choosing xmin via  Clauset et al (SIAM, 2009) and using the non-parametric bootstrap version of the KS goodness-of-fit test I obtain a p-value of roughly 0.02 using 2500 repetitions, which again suggests that the power-law model is not plausible.
On the whole, I think these results are not very supportive of the power-law model.  Even though the power-law model remains plausible when  I choose xmin via Clauset et al (SIAM, 2009) and assess goodness-of-fit using the parametric version of the KS test, it is important to note that such an assessment is not properly taking into account the flexibility of the Clauset et al (SIAM, 2009) procedure in choosing the threshold parameter (along with estimating the scaling exponent).2  Once I take this the additional flexibility into account (i.e., by using the non-parametric KS test), I again find that the power-law model is not plausible!  Here is a nice set of density plots of the bootstrap KS distances from each version of the goodness-of-fit test, that illustrates the differences between the parametric and non-parametric procedures (I hope!):
Note that implementing the non-parametric version of the KS goodness-of-fit test basically shifts and "condenses" the sampling distribution of the KS distance (relative to both parametric versions).  Taking into account the additional flexibility of the Clauset et al (SIAM, 2009) procedure for fitting the power-law null model reduces both the mean and variance of sampling distribution of the KS distance, D.

Quick test of alternative hypotheses.  A very plausible alternative distribution for mutual funds is the log-normal (recall Gibrat's law of proportionate growth would predict log-normal).  Can I reject the power-law in favour of the log-normal using likelihood ratio tests?  YES!
• Using a 20% cut-off as suggested in Gabaix et al (QJE, 2006) the Vuong LR test statistic is -3.63 with a two-sided p-value of roughly 0.00 (which implies that, given the data, I can distinguish between the power-law and log-normal) and a one-sided p-value of roughly 0.00 (implying that I can reject the power-law in favour of the log-normal!
• Choosing xmin via  Clauset et al (SIAM, 2009) the Vuong LR test statistic is -2.27 with a two-sided p-value of roughly 0.023 (which implies that, given the data, I can distinguish between the power-law and log-normal) and a one-sided p-value of roughly 0.012 (implying that I can reject the power-law in favour of the log-normal!
What are the economic implications or all of this?  Does it matter whether or not mutual fund size is distributed according to a log-normal or power-law distribution?

I think it matters quite a bit for the model put forward in Gabaix et al (QJE, 2006)! In Gabaix et al (QJE, 2006) investors take as given that the distribution of investors' size follows a power-law  Specifically, an investor makes use of the distribution of investor size in calculating his optimal trading volume.   Gabaix et al (QJE, 2006) relies on the power-law being a "good approximation" to the true distribution of investor size in order to justify investors taking a power-law distribution as given.  I have provided evidence that the power-law is not a plausible model, and that a log-normal distribution is a significantly better fit.  If the true distribution is not a power-law, then agents in Gabaix et al (QJE, 2006) are effectively solving a mis-specified optimization program and there is no longer any guarantee that the solution to the properly specified optimization program will result in power-law tails for equity and volume (paradoxically, however, this might turn out to be "good" for Gabaix et al (QJE, 2006) in the sense that I have argued in previous posts that the tails of equity returns are not power-law anyway!).

However, whether or not it matters if a distribution is log-normal, power-law, or simply "heavy-tailed" depends on context.  In this case a log-normal distribution is consistent with Gibrat's law of proportionate growth.  Gibrat's law applied to investor size says that if the growth rate of investors' assets under management is independent of the amount of assets currently under management, then the distribution of investor size will follow a log-normal distribution.  One could easily test whether or not the growth rate of mutual funds is independent of size. Maybe someone already has?
Personally, I think the important takeaway from the above analysis is just that there is quite extreme heterogeneity in the size of investors (although not extreme enough to justify a power-law)!  In other words, the distribution of investor sizes is generically "heavy-tailed."  Investors are not necessarily small relative to the "market" which suggests that at least some investors are unable to take prices parametrically (i.e., as given) when determining their optimal trading behavior.  In this respect I wholeheartedly agree with Gabaix et al (QJE, 2006): investor size does play a significant role in determining dynamics of asset prices.  These results also suggest an alternative way to think about the liquidity of an asset.  An asset might be very liquid (i.e., re-saleable) for one investor, but might be very illiquid for another because the desired volume of trade is different!  Liquidity might not simply be an inherent property of the asset itself, but may also depend on the "size" of the investor holding it!

1 Gabaix et al (QJE, 2006) use data on mutual fund assets from 4th quarter of 1999, whereas I use the larger and more recent data set from 4th quarter of 2009.
2 Assessing goodness-of-fit using a parametric version of the KS goodness-of-fit test that takes the optimal threshold chosen using the Clauset et al (SIAM, 2009) method as given is both conceptually easier to understand, and computationally simpler to implement.  This procedure also sets an effective lower bar for the plausibility of the power-law model: if the power-law model is not plausible using this parametric KS goodness-of-fit test, then it will be even less plausible if I use the more flexible (and more rigorous) non-parametric KS goodness-of-fit test.