## Tuesday, July 12, 2011

### Heteroskedasticity simulations...

Finally getting back into my routine after a very successful summer school in Jerusalem.   This blog post is about part of a graduate econometrics lab on violations of the classical linear regression model that I have been working on lately.  Basic idea of the lab, which I borrowed from Prof. Mark Schaffer, is to expose students to Monte Carlo simulation and violations of the classical linear model simultaneously by having them construct simulations that demonstrate what happens to standard hypothesis tests when the DGP exhibits one of the following issues:
1. Heteroskedasticity
2. Autocorrelation
3. Endogenous regressor
4. Non-stationary regressor
To keep this post reasonably short, I am only going to focus on the section of the lab that compares various techniques for dealing with heteroskedasticity via Monte Carlo simulations.

The DGPs:
I specify the two data generating processes as follows.  For DGP #1 x can only take positive values:
• x = abs(rnorm(n, mean=0, sd=1))
• e = rnorm(n, mean=0, sd=1)
• u = e*x (implies that the variance of u increases with the square of x!)
• y = x + u
The first pane in the plot below is y versus x for n=1000 with the OLS regression line in red and the true regression function in grey.  If the heteroskedasticity isn't obvious enough, the second pane is a plot of the squared OLS residuals against the regressor x.  The true variance of the disturbance increase with the square of x (i.e., the grey curve).  Note that the true variance process looks roughly linear (even though it actually is not!).  The fact that a linear approximation to the true variance process looks like it would work pretty well for DGP #1 plays a role later.
For DGP #2 the regressor, x, is allowed to take both positive and negative values (i.e., x=rnorm(n, mean=0, sd=1)).  Everything else is the same as in DGP #1.  Here is the analogous plot for DGP #2:
Note that the even though the true variance process is non-linear in both DGP #1 and DGP #2, it can be (reasonably) well approximated by a linear function in DGP #1.  However, a linear approximation to the true variance function is an EPIC FAIL for DGP #2!  This plays a huge role in the breakdown of FGLS documented below.

Comparison #1 (OLS, OLS w/ HAC Std. Errors, and WLS): Assume that you know the true form of the variance function (i.e., you have ritually slaughtered the requisite small woodland creature, dried its bones, and left them for 3.5 days to be scattered by the wind at which point you were able to use the pattern created to divine the true variance function...alternatively, if you live in Greece, you could simply consult the Oracle).

After having done all of this, the first simulation sets the sample size to n=100 and regresses y on x (where x is given by DGP # 1) 5000 times while capturing the parameter estimate for the coefficient on x using both OLS and WLS (which incorporates the optimal weights given the true variance function) and the various flavours of the standard errors.  Note the HAC stands for Heteroskedastic and Autocorrelation Consistent standard errors.  These standard errors are used frequently in econometrics to adjust the standard errors in the presence of heteroskedasticity and autocorrelation (HAC standard errors are rumored to be consistent in the presence of arbitrary forms of heteroskedasticity and autocorrelation...although I confess to not having read the original journal articles).

Plotting the kernel densities of the OLS estimator and the WLS estimator shows that while both estimators appear to be unbiased (consistent with theory), only the WLS estimator takes advantage of the known variance function (OLS ignores this information) and thus the variance of the WLS estimator is much smaller.
Next, let's consider the size properties of the standard hypothesis test (i.e., type I error).  How often would you expect to reject the null hypothesis of β=1 if your use the standard test statistic and critical values of -1.96 and +1.96 were valid? (Answer: 5% of 5,000 repetitions = 250 times). How many rejects do we actually observe in this simulation?
• OLS: ~ 1200!
• OLS w/ HAC standard errors: ~ 450
• WLS: ~ 250
It would seem that only the WLS has the expected amount of type I error.  Not surprisingly, OLS is in bad shape.  To get even more information about the size properties, let's look at histograms of the p-values associated with the above tests.  If the test statistics were unbiased, then we should expect to find uniformly distributed p-values...

...and with OLS w/ HAC standard errors, and WLS we do find more or less uniformly distributed p-values.  OLS w/ HAC exhibits some very slight distortions...but these are nothing compared with OLS!

Finally, let's look at the power of the standard hypothesis test (i.e., type II error). Below I plot the power curves using the various flavors of standard errors.  What are we looking for in this plot? Two things.  First, a hypothesis test with good size properties should obtain a minimum value of 0.05 (for a two-tailed test) at the true value of β=1.  Second, the power curve should be relatively narrow.  The wider the power curve, the lower is the probability of correctly rejecting a false null hypothesis (i.e., the higher is the type II error).

Unsurprising conclusion of this comparison: if you are going to invest the 3.5 days necessary to obtain the true variance function, then you should use it when estimating your regression.

Comparison #2 (OLS, OLS w/ HAC Std. Errors, FGLS, and WLS (npreg): Now suppose that you do not know the true form of the variance function but wish to estimate it from the data.  This simulation sets the sample size to n=100 and regresses y on x (where x is given by DGP # 1) 5000 times while capturing the parameter estimate for the coefficient on x using OLS, FGLS, and WLS (npreg) and the various flavours of the standard errors.  Feasible Generalised Least Squares (FGLS) estimates the true variance function from the data by linearly regressing the log of the squared OLS residuals on x and using the fitted values as the estimate of the variance function. WLS (npreg) estimates the variance function using the fitted values from a non-parametric regression of the log of the squared OLS residuals on x (see Cosma Shalizi's lecture 5 notes for details).
While all three estimators appear to be unbiased, the FGLS and WLS (npreg) estimators estimate the unknown variance function (OLS does not) and thus the variance of these estimators is much smaller.

Again, consider the size and power of the standard hypothesis. How many rejections of the null hypothesis of β=1 do we actually observe?
• OLS: ~1200!
• OLS w/ HAC standard errors: ~ 400
• WLS (npreg): ~ 600
• FGLS: ~ 300
FGLS would seem to perform the best (in a relative sense), however all four procedures have excessive type I error.  They all reject a true null hypothesis too frequently.  I was expecting the size properties of the WLS (npreg) estimator to be superior to those of the FGLS estimator.  Could ye old trade-off between bias and variance.  Given that the true variance process is roughly linear for DGP #1 the bias introduced by using FGLS should be pretty small.  I also just used the default bandwidth for npreg() in my R code.  Perhaps I should have spent more time trying to optimize the bandwidth (if default bandwidth too high, then npreg() is overfitting, which should lead to higher variance in the WLS (npreg) estimator).  Also, perhaps if I had implemented in iterative procedure to estimate the variance function using the non-parametric regression technique, WLS (npreg) would outperform FGLS.  Any thoughts/ideas from readers would be welcome.

Histograms of the p-values:
In this case only with FGLS we do find more or less uniformly distributed p-values.  OLS w/ HAC se exhibits some very slight distortions...but these are nothing compared with OLS!  As touched on above, WLS (npreg) also exhibits surprisingly large size distortions.

Plot the power curve:

Again, hypothesis tests using OLS w/ HAC and FGLS have pretty good size properties, but the hypothesis tests using FGLS have superior power properties.  If you have reason to suspect that the true variance function is roughly linear, then it looks like FGLS is the way to go (although the difference between the two procedures might become negligible as sample size increases).

Comparison #3 (OLS, OLS w/ HAC Std. Errors, FGLS, and WLS (npreg) with DGP #2: Now suppose that you do not know the true form of the variance function but wish to estimate it from the data.  This simulation sets the sample size to n=100 and regresses y on x (where x is given by DGP # 2) 5000 times while capturing the parameter estimate for the coefficient on x using OLS, FGLS, and WLS (npreg) and the various flavours of the standard errors.   Recall that the true variance process for DGP #2 is poorly approximated by a linear specification.  Thus my prior was that FGLS should be an EPIC FAIL, but the WLS (npreg) would save the day! I turned out to be half right...
While all three estimators appear to be unbiased, the WLS (npreg) estimator is considerably more precise than either the OLS or FGLS estimators. Why is FGLS doing so poorly? The true variance function for DGP #2 is sufficiently non-linear that the bias introduced by using FGLS to estimate the variance function produces and EPIC FAIL!  Note that the kernel density of FGLS tracks more or less with the density of the OLS estimator!

Consider the size and power of the standard hypothesis. How many rejections of the null hypothesis of β=1 do we actually observe?
• OLS: ~1200!
• OLS w/ HAC standard errors: ~ 350
• WLS (npreg): ~ 900 ?!
• FGLS: ~ 1400!!
FGLS is an epic fail.  OLS with HAC standard error doesn't do too poorly.  What is going on with WLS (npreg)!  I was hoping that it would shine through in this most realistic of cases!  I must be over-fitting and the sample size is pretty small (only n=100).  Extending the sample size might help.  I am really perplexed by this...

Histograms of the p-values:
Only OLS with HAC standard errors has roughly uniform p-values.  The rest exhibit significant distortions.

Finally, the power curve:
OLS with HAC would seem to win again.  Its power properties are not great, but should improve with sample size.

As always R-code is available upon request.  If anyone has thoughts on what is going on with my implementation of WLS (using non-parametric regression to get the optimal weights), definitely feel free to comment!

1. I am puzzled by the behavior in case #2 --- the sampling distributions of beta-hat for using npreg to estimate variances, and for FGLS, are almost identical, but you report a factor of 2 difference in the number of rejections. --- Actually, come to that, how are you calculating the standard error in beta-hat in the WLS/npreg case?

2. Tails seem to be a bit heavier for the distribution of beta using npreg. I too am puzzled by this, so much so that I have been trying to find a bug in my implementation.

Basically, I use npreg() to estimate the variance function, and then use the result of this estimation as the weights for WLS. I then grab the coefficient estimate and the corresponding standard error from regression summary.

Here is the psuedo-code for calculating the standard error for the WLS/npreg case:

For (i in 1:5000) {
# Regress y on x and save the residuals
u.hat = residuals(lm(y ~ x))
# Square the residuals and take logs
log.u.hat.sq = log((u.hat)^2)
# Regress log.u.hat.sq on regressor and save the fitted values
g.hat = fitted(lm(log.u.hat.sq ~ x))
# Exponentiate to obtain h.hat
h.hat = exp(g.hat)

# FGLS using the Woolridge procedure
fgls.beta.hat[i] = coef(summary(lm(y ~ x, weights=1/h.hat)))[2,1]
fgls.std.error[i] = coef(summary(lm(y ~ x, weights=1/h.hat)))[2,2]

# Estimating the variance function using non-parametric techniques
npreg.weights = exp(fitted(npreg(log.u.hat.sq ~ x)))
npreg.beta.hat[i] = coef(summary(lm(y ~ x, weights=1/npreg.weights)))[2,1]
npreg.std.error[i] = coef(summary(lm(y ~ x, weights=1/npreg.weights)))[2,2]

I really do suspect there is a bug somewhere...I spent a day or so looking for it, but then moved back to my ongoing research project of looking for power laws in global stock returns using the improved techniques that you helped develop with Aaron Clauset and Mark Newman (I haven't been finding very many).