Below, I expand on the example given in those lecture notes by
- Estimating the power-law threshold parameter, xmin, from the data
- Constructing standard errors and confidence intervals for the scaling parameter, α, and for the threshold parameter, xmin (that take into account the fact that I have estimated both α and xmin).
- Testing an alternative hypothesis for the distribution of wealth (i.e, log-normal)
If the data follow a power law, then this relationship should be roughly linear. The converse, however, is not true! Linearity is not a sufficient condition for a power law. Also, lots of things might look linear on a log-log plot even when they aren't.
Recipe for Fitting Power Laws:
- α: 2.34
- xmin: 9e8 ($900 million)
- obs.above.threshold: 302 (out of 400)
Standard Errors and Confidence Intervals: To calculate the standard errors and confidence intervals for the parameters I used a non-parametric procedure outlined in Cosma's lecture notes. Cosma also graciously provides the code for his lecture notes as well! To calculate the standard errors and 95% confidence intervals for α and xmin, I had to alter (trivially) the provided code to take into account the fact that I was not given the threshold parameter and instead estimated it from the data. Results:
- α.se: 0.10
- xmin.se: 364e6 ($364 million)
- α.ci: (2.13, 2.53)
- xmin.ci: (-2e8, 1.05e9)
Survival Plot (fitted power-law included):
Looks pretty good! But while you might be able to rule out a particular model just by looking at it, I think you would be hard pressed to rule a model "in" by "eye-balling it" (unfortunately, I am sure that people do).
Goodness-of-fit Test: Continuing to follow Clauset et al (2009), I again used a Monte Carlo KS test procedure to conduct a goodness-of-fit test for the power law model. The procedure basically generates synthetic data by re-sampling the data itself below xmin, and then generating simulated data from a power law with the estimated parameters above xmin. A power-law is then re-estimated to each synthetic data set and the KS statistic is stored. The result is a p-value that represents the fraction of simulations whose KS statistic was at least as large as the one obtained from fitting the power law model to the wealth data. A small p-value (i.e., a p-value < 0.10) indicates that only a very small fraction of the simulations had a KS stat as large as the wealth data and is evidence against the plausibility of the power-law model. Results: p-value of 0.0012! It would seem that the data reject the power-law as a plausible model. This is perhaps not surprising...
Testing Alternative Hypotheses: So the wealth distribution does not follow a power law! What about log-normal? I test this using the likelihood ratio procedure recommended in Clauset et al (2009).
The null hypothesis for comparing two non-nested distributions is that both distributions are equally far (in a Kullback-Leibler relative entropy sense...note to self: understand what the hell Kullback-Leibler relative entropy means!) from one another. I report the Vuong test statistic and the two-sided p-value:
- Vuong Statistic: 0.21
- p-value: 0.83
Other alternatives (for which Clauset et al (2009) publish results) are the exponential, stretched exponential (Weibull), and the power-law with exponential cut-off. I continue to have trouble getting the code for fitting the exponential distribution to work (keep getting an error in the nlm() method for minimizing the negative log-likelihood function...it keeps returning non-finite values...which makes me think that it is probably user error on my part). I have also not yet figured out how to get the code for estimating the power-law with exponential cutoff to work. This is at the top of my to-do list as I feel it is a very plausible alternative distribution (perhaps not necessarily for wealth...but definitely for some other data sets that I am working with at the moment).
I can however report results for the stretched exponential (Weibull). I was able to get the code to work after I learned how to suppress warnings in R (i.e., using the suppressWarnings() method). Again I report the Vuong test statistic and the two-sided p-value:
- Vuong Statistic: 8.05
- p-value: 0.00
The real objective of this post was to replicate results from fitting a power-law to empirical data (in a rigorous way that does not abuse linear regression), so that I feel confident in applying these tools in my own research. Thus I purposefully did not stray too far from Clauset et al (2009). I think I succeeded (at least in replicating the work of Clauset et al (2009) for this data set). Hopefully, I have not made any glaring errors...but if I did, they are my own!
No comments:
Post a Comment