- Heteroskedasticity
- Autocorrelation
- Endogenous regressor
- Non-stationary regressor
The DGPs:
I specify the two data generating processes as follows. For DGP #1 x can only take positive values:
- x = abs(rnorm(n, mean=0, sd=1))
- e = rnorm(n, mean=0, sd=1)
- u = e*x (implies that the variance of u increases with the square of x!)
- y = x + u
For DGP #2 the regressor, x, is allowed to take both positive and negative values (i.e., x=rnorm(n, mean=0, sd=1)). Everything else is the same as in DGP #1. Here is the analogous plot for DGP #2:
Note that the even though the true variance process is non-linear in both DGP #1 and DGP #2, it can be (reasonably) well approximated by a linear function in DGP #1. However, a linear approximation to the true variance function is an EPIC FAIL for DGP #2! This plays a huge role in the breakdown of FGLS documented below.
After having done all of this, the first simulation sets the sample size to n=100 and regresses y on x (where x is given by DGP # 1) 5000 times while capturing the parameter estimate for the coefficient on x using both OLS and WLS (which incorporates the optimal weights given the true variance function) and the various flavours of the standard errors. Note the HAC stands for Heteroskedastic and Autocorrelation Consistent standard errors. These standard errors are used frequently in econometrics to adjust the standard errors in the presence of heteroskedasticity and autocorrelation (HAC standard errors are rumored to be consistent in the presence of arbitrary forms of heteroskedasticity and autocorrelation...although I confess to not having read the original journal articles).
Plotting the kernel densities of the OLS estimator and the WLS estimator shows that while both estimators appear to be unbiased (consistent with theory), only the WLS estimator takes advantage of the known variance function (OLS ignores this information) and thus the variance of the WLS estimator is much smaller.
Next, let's consider the size properties of the standard hypothesis test (i.e., type I error). How often would you expect to reject the null hypothesis of β=1 if your use the standard test statistic and critical values of -1.96 and +1.96 were valid? (Answer: 5% of 5,000 repetitions = 250 times). How many rejects do we actually observe in this simulation?
- OLS: ~ 1200!
- OLS w/ HAC standard errors: ~ 450
- WLS: ~ 250
...and with OLS w/ HAC standard errors, and WLS we do find more or less uniformly distributed p-values. OLS w/ HAC exhibits some very slight distortions...but these are nothing compared with OLS!
Finally, let's look at the power of the standard hypothesis test (i.e., type II error). Below I plot the power curves using the various flavors of standard errors. What are we looking for in this plot? Two things. First, a hypothesis test with good size properties should obtain a minimum value of 0.05 (for a two-tailed test) at the true value of β=1. Second, the power curve should be relatively narrow. The wider the power curve, the lower is the probability of correctly rejecting a false null hypothesis (i.e., the higher is the type II error).
Unsurprising conclusion of this comparison: if you are going to invest the 3.5 days necessary to obtain the true variance function, then you should use it when estimating your regression.
Comparison #2 (OLS, OLS w/ HAC Std. Errors, FGLS, and WLS (npreg): Now suppose that you do not know the true form of the variance function but wish to estimate it from the data. This simulation sets the sample size to n=100 and regresses y on x (where x is given by DGP # 1) 5000 times while capturing the parameter estimate for the coefficient on x using OLS, FGLS, and WLS (npreg) and the various flavours of the standard errors. Feasible Generalised Least Squares (FGLS) estimates the true variance function from the data by linearly regressing the log of the squared OLS residuals on x and using the fitted values as the estimate of the variance function. WLS (npreg) estimates the variance function using the fitted values from a non-parametric regression of the log of the squared OLS residuals on x (see Cosma Shalizi's lecture 5 notes for details).
While all three estimators appear to be unbiased, the FGLS and WLS (npreg) estimators estimate the unknown variance function (OLS does not) and thus the variance of these estimators is much smaller.
Again, consider the size and power of the standard hypothesis. How many rejections of the null hypothesis of β=1 do we actually observe?
- OLS: ~1200!
- OLS w/ HAC standard errors: ~ 400
- WLS (npreg): ~ 600
- FGLS: ~ 300
Histograms of the p-values:
In this case only with FGLS we do find more or less uniformly distributed p-values. OLS w/ HAC se exhibits some very slight distortions...but these are nothing compared with OLS! As touched on above, WLS (npreg) also exhibits surprisingly large size distortions.
Plot the power curve:
Again, hypothesis tests using OLS w/ HAC and FGLS have pretty good size properties, but the hypothesis tests using FGLS have superior power properties. If you have reason to suspect that the true variance function is roughly linear, then it looks like FGLS is the way to go (although the difference between the two procedures might become negligible as sample size increases).
Comparison #3 (OLS, OLS w/ HAC Std. Errors, FGLS, and WLS (npreg) with DGP #2: Now suppose that you do not know the true form of the variance function but wish to estimate it from the data. This simulation sets the sample size to n=100 and regresses y on x (where x is given by DGP # 2) 5000 times while capturing the parameter estimate for the coefficient on x using OLS, FGLS, and WLS (npreg) and the various flavours of the standard errors. Recall that the true variance process for DGP #2 is poorly approximated by a linear specification. Thus my prior was that FGLS should be an EPIC FAIL, but the WLS (npreg) would save the day! I turned out to be half right...
While all three estimators appear to be unbiased, the WLS (npreg) estimator is considerably more precise than either the OLS or FGLS estimators. Why is FGLS doing so poorly? The true variance function for DGP #2 is sufficiently non-linear that the bias introduced by using FGLS to estimate the variance function produces and EPIC FAIL! Note that the kernel density of FGLS tracks more or less with the density of the OLS estimator!
Consider the size and power of the standard hypothesis. How many rejections of the null hypothesis of β=1 do we actually observe?
- OLS: ~1200!
- OLS w/ HAC standard errors: ~ 350
- WLS (npreg): ~ 900 ?!
- FGLS: ~ 1400!!
Histograms of the p-values:
Only OLS with HAC standard errors has roughly uniform p-values. The rest exhibit significant distortions.
Finally, the power curve:
OLS with HAC would seem to win again. Its power properties are not great, but should improve with sample size.
As always R-code is available upon request. If anyone has thoughts on what is going on with my implementation of WLS (using non-parametric regression to get the optimal weights), definitely feel free to comment!
I am puzzled by the behavior in case #2 --- the sampling distributions of beta-hat for using npreg to estimate variances, and for FGLS, are almost identical, but you report a factor of 2 difference in the number of rejections. --- Actually, come to that, how are you calculating the standard error in beta-hat in the WLS/npreg case?
ReplyDeleteTails seem to be a bit heavier for the distribution of beta using npreg. I too am puzzled by this, so much so that I have been trying to find a bug in my implementation.
ReplyDeleteBasically, I use npreg() to estimate the variance function, and then use the result of this estimation as the weights for WLS. I then grab the coefficient estimate and the corresponding standard error from regression summary.
Here is the psuedo-code for calculating the standard error for the WLS/npreg case:
For (i in 1:5000) {
# Regress y on x and save the residuals
u.hat = residuals(lm(y ~ x))
# Square the residuals and take logs
log.u.hat.sq = log((u.hat)^2)
# Regress log.u.hat.sq on regressor and save the fitted values
g.hat = fitted(lm(log.u.hat.sq ~ x))
# Exponentiate to obtain h.hat
h.hat = exp(g.hat)
# FGLS using the Woolridge procedure
fgls.beta.hat[i] = coef(summary(lm(y ~ x, weights=1/h.hat)))[2,1]
fgls.std.error[i] = coef(summary(lm(y ~ x, weights=1/h.hat)))[2,2]
# Estimating the variance function using non-parametric techniques
npreg.weights = exp(fitted(npreg(log.u.hat.sq ~ x)))
npreg.beta.hat[i] = coef(summary(lm(y ~ x, weights=1/npreg.weights)))[2,1]
npreg.std.error[i] = coef(summary(lm(y ~ x, weights=1/npreg.weights)))[2,2]
I really do suspect there is a bug somewhere...I spent a day or so looking for it, but then moved back to my ongoing research project of looking for power laws in global stock returns using the improved techniques that you helped develop with Aaron Clauset and Mark Newman (I haven't been finding very many).