Thursday, October 30, 2025

Heteroskedasticity strong customary errors: Some sensible concerns


Introduction

Some discussions have arisen these days with regard to which customary errors needs to be utilized by practitioners within the presence of heteroskedasticity in linear fashions. The dialogue intrigued me, so I took a second take a look at the present literature. I present an outline of theoretical and simulation analysis that helps us reply this query. I additionally current simulation outcomes that mimic or broaden a few of the present simulation research. I’ll share the Stata code I used for the simulations in hopes that it is perhaps helpful to those who need to discover how the varied standard-error estimators carry out in conditions which might be related to your analysis.

From my simulation workout routines, I conclude that no single variance–covariance matrix of estimators (VCE) is most popular to others throughout all attainable combos of various pattern sizes and levels of heteroskedasticity. For example, if we lengthen the simulation design of MacKinnon (2012) to incorporate discrete covariates, the 5% rejection fee for the coefficients of the discrete covariates in some circumstances is finest once we use the Huber/White/sandwich VCE offered by Stata’s vce(strong) choice. For steady covariates, the conclusions are completely different.

From the literature, two sensible concerns come up. First, taking pattern dimension by itself as a criterion is just not sufficient to acquire correct customary errors within the presence of heteroskedasticity. What issues is the variety of observations per regressor. When you’ve got 250 observations and 4 regressors, efficiency of heteroskedasticity-consistent standard-error estimators will most likely be good. When you’ve got 250 observations and 10 regressors, this may occasionally now not be true. Equally, having 10,000 observations might not be sufficient when you’ve got 500 regressors. Because the variety of parameters grows, so does the data required to persistently estimate them. Additionally, because the variety of observations per regressor turns into smaller, all present heteroskedasticity-consistent customary errors turn out to be inaccurate, as mentioned by Cattaneo, Jansson, and Newey (2018).

Second, leverage factors far above common matter. Leverage factors are capabilities of the covariates that measure how a lot affect an statement has on the atypical least-squares match. Leverage factors are between 0 and 1. A leverage level of 1 wields leverage within the sense that the orientation of the regression line within the course of the covariate is totally decided by the covariates. The estimated residual for some extent with leverage of 1 is 0. Simulation proof reveals efficiency of heteroskedasticity-consistent customary errors improves when high-leverage factors should not current in a design, as mentioned in Chesher and Austin (1991).

These two concerns are associated. The imply of the leverage factors is the same as the variety of regressors divided by the variety of observations (the inverse of the variety of observations per regressor). For a set variety of regressors, because the pattern dimension will increase, the imply of the leverage and the chance of getting leverage factors with giant values lower. Thus, having adequate observations per regressor reduces the issues related to high-leverage factors.

To summarize, once we take into consideration strong customary errors, the related metric is the variety of observations per regressor. If the variety of observations per regressor is small, whatever the pattern dimension, our inference could also be imprecise, even once we use heteroskedasticity-consistent customary errors that appropriate for bias. There is no such thing as a silver bullet that provides you with dependable inference if there’s not sufficient knowledge for every parameter you need to estimate.

Historical past and instinct

After we take into consideration heteroskedasticity-consistent customary errors in linear fashions, we consider White (1980). The important thing results of White’s work is that we are able to get a constant estimate of the VCE even when we can’t get a constant estimate of a few of its particular person parts. This was a groundbreaking perception and led to different necessary developments within the estimation of ordinary errors, as talked about by MacKinnon (2012).

White’s outcomes, nevertheless, are asymptotic. They’re applicable when you’ve got a big pattern dimension beneath sure regularity situations. There may be nothing benign in regards to the phrase “beneath sure regularity situations”. That is the place issues get tough and the place we have to discover what must be happy for us to belief the instruments we’re utilizing.

White’s estimator has bias. Like many asymptotic outcomes, the bias decreases because the variety of observations will increase. MacKinnon and White (1985) suggest three asymptotically equal estimators to deal with the small-sample bias of White’s heteroskedasticity-consistent customary errors.

The instinct behind the estimators is that the least-squares residuals are inclined to underestimate the true errors. The proposed options all tackle this by rising the load of the person estimates of the variance. The primary one in all these estimators, HC1, is a degrees-of-freedom adjustment of the order (n/(n-k)), the place (n) is the pattern dimension and (ok) is the variety of regressors. You get this in Stata while you use vce(strong). The opposite two estimators are HC2 (vce(hc2)), which corrects for the bias within the variance of the residual that arises beneath homoskedasticity, and HC3 (vce(hc3)), a jackknife estimator. MacKinnon and White (1985) discover that, for small pattern sizes, HC2 and HC3 carry out higher than HC1 of their simulations and that HC3 is the popular various.

HC2 and HC3 are capabilities of (h_{ii}), the diagonal parts of the matrix
start{equation*}
X(X’X)^{-1}X’
finish{equation*}
The (h_{ii}) are additionally known as leverage factors. A excessive (h_{ii}), relative to the common of the (h_{ii}), exerts extra leverage on the course of the regression aircraft. Factors with a leverage of 1, as an example, are on the regression line. HC2 and HC3 give greater weight to residuals of observations with greater leverage.

Chesher and Jewitt (1987) and Chesher and Austin (1991) research the bias of the estimators proposed by MacKinnon and White (1985). The express type of the bias is a operate of (h_{ii}). One of many attention-grabbing outcomes of Chesher and Austin (1991) is that the simulations of MacKinnon and White (1985) “comprise some extent of average leverage”, which, as soon as eliminated, makes “all of the checks that use heteroskedasticity[-]constant covariance matrix estimators carry out nicely”.

Lengthy and Erwin (2000) present recommendation about utilization of heteroskedasticity-consistent customary errors. Like MacKinnon and White (1985), they discover that HC3 performs higher in small samples. Additionally they recommend the completely different VCE estimators studied in MacKinnon and White (1985) begin to be equal after 250 observations. This discovering is constant throughout their simulation designs. The quantity 250 is just not random. Their designs have 5 regressors. After 250 observations, there are greater than 50 observations per regressor. This is a vital consideration. With 10 regressors and heteroskedasticity, 250 observations won’t be satisfactory.

Theoretical and simulation outcomes by Cattaneo, Jansson, and Newey (2018) illustrate the significance of getting sufficient observations for every regressor. They take a look at the asymptotic relation of (ok/n). Though their outcomes are for a distinct framework, they present that efficiency of all of the estimators mentioned above is poor when (n/ok) is small.

The outcomes of Lengthy and Erwin (2000) and Cattaneo, Jansson, and Newey (2018) are carefully associated to these of MacKinnon and White (1985), Chesher and Jewitt (1987), and Chesher and Austin (1991). They’re associated through (h_{ii}). The imply of the (h_{ii}) is (ok/n). Thus, the imply of the leverage factors can also be a manner of taking a look at how a lot data we have to get well every of the (ok) parameters in our specification.

Simulation outcomes

Under, I current three units of simulations outcomes. The primary follows the spirit of MacKinnon (2012). The second follows the spirit of Lengthy and Erwin (2000). The third follows Angrist and Pischke (2009). Within the simulations, I examine HC1, HC2, HC3, and the wild bootstrap (WB). The WB within the simulations imposes the null speculation, is for 999 replications, and makes use of Rademacher weights.

The MacKinnon (2012) simulations take a look at the efficiency of heteroskedasticity-consistent customary errors within the HCk class (HC1, HC2, and HC3) and on the WB. He considers an error time period, (varepsilon_i), of the shape
start{equation*}
varepsilon_i = left(sum_{i=1}^kX_{ik}beta_kright)^{gamma}N(0,1)
finish{equation*}

A worth of (gamma=0) implies no heteroskedasticity, and a price of (gamma geq 1) implies excessive heteroskedasticity. The covariates are uncorrelated and log-normally distributed. The simulations under that I confer with as MacKinnon-type simulations comply with the identical construction but in addition incorporate discrete covariates.

Lengthy and Erwin (2000) create the variance by multiplying the error time period by a operate of covariates additionally. Nonetheless, they permit the covariates to be correlated and embody discrete covariates. Additionally, the distribution of all covariates differs. For some designs, error phrases are drawn from a traditional distribution and for others from a (chi^2) distribution. The simulations under that I name Lengthy and Erwin-type simulations comply with the concept of correlating the covariates, utilizing error phrases from a (chi^2) distribution, and having steady and discrete covariates from completely different distributions. In contrast to Lengthy and Erwin (2000), nevertheless, I embody all covariates within the expression that multiplies the error time period.

The Angrist and Pischke (2009) simulations are for just one binary variable. They introduce heteroskedasticity, permitting the variance to vary relying on the worth of the covariate. Additionally, the proportion of zeros and ones is skewed. I comply with the identical design however discover conduct for various pattern sizes and never for a pattern dimension of 30, like Angrist and Pischke (2009).

The rationale I selected these three units of simulations is to attempt to cowl essentially the most consultant and well-known leads to the literature. I broaden them to include options that I needed to think about, comparable to discrete covariates and a type of heteroskedasticity that includes all covariates. The modifications are minor however present instinct that was not fast from the unique specs.

The do-files used for the simulations are within the appendix.

MacKinnon-type simulations

I conduct simulations for 3 pattern sizes—100, 1,000, and 5,000—and 4 ranges of heteroskedasticity: low ((gamma=0.5)), medium ((gamma=1)), excessive ((gamma=1.5)), and really excessive ((gamma=2.0)). In all circumstances, there are six parameters we try to get well. Two parameters are related to log-normally distributed steady variables, as urged in MacKinnon (2012). The opposite 4 parameters are from two categorical variables with three classes (the bottom class is excluded). I preserve solely simulation attracts for which the VCE is full rank. In a few the simulations, I lose one of many 2,000 repetitions as a result of the matrix is just not full rank.

When (N=100), the variety of observations per regressor, (N/ok = 16.66), is small, making inference difficult for all estimators. For every simulation draw, I compute the utmost of the leverage factors. The typical most worth of the leverages is round 0.46 for all ranges of heteroskedasticity and reaches 1 for a few of the simulation attracts.

When (N=1000), the variety of observations per regressor, (N/ok = 166.66), is a bit bigger, and inference begins to turn out to be extra correct. The utmost of the leverage for all attracts now has a imply of round 0.20 for all ranges of heteroskedasticity and, at its largest, is between 0.78 and 0.87 relying on the extent of heteroskedasticity. Inference continues to be difficult, and a few of the points we observe at (N/ok=16.66) stay.

When (N=5000), the variety of observations per regressor is (N/ok = 833.33), and inference turns into extra correct for all estimators. The utmost of the leverage for all attracts now has a imply of round 0.10 for all ranges of heteroskedasticity and, at its largest, is between 0.6 and 0.82 relying on the extent of heteroskedasticity. Even for this pattern dimension, leverage factors might be excessive in some designs.

Under, I current the simulation outcomes. I cut up the dialogue between coefficients related to steady covariates and people related to categorical covariates.

Steady covariates

For a small pattern dimension, (N=100), the 5% rejection fee of the coefficients for the continual covariates follows what was discovered by MacKinnon and White (1985). That’s, 5% rejection charges are nearer to 0.05 for HC3 than for HC2 and HC1. Nonetheless, 5% rejection charges are above 0.05 for all HCk-type estimators. The WB, however, tends to be extra conservative, with rejection charges which might be nearer to 0.05, than the opposite VCE estimators.

Desk 1 under presents the simulation outcomes for the 4 VCE estimators for various ranges of heteroskedasticity when the pattern dimension is (N=100).

Desk 1: Steady covariates: 5% rejection charges for various ranges of heteroskedasticity

Simulation outcomes for (N=100) and a couple of,000 replications
Parameter VCE (gamma=0.5) (gamma=1.0) (gamma=1.5) (gamma=2.0)
(beta_1) HC1 0.159 0.208 0.234 0.255
HC2 0.125 0.156 0.170 0.175
HC3 0.089 0.096 0.096 0.086
WB 0.042 0.041 0.043 0.037
(beta_2) HC1 0.137 0.180 0.214 0.238
HC2 0.109 0.138 0.151 0.157
HC3 0.080 0.088 0.089 0.087
WB 0.034 0.035 0.031 0.028

In tables 2 and three under, we see that when the pattern sizes are (N=1000) and (N=5000), the conduct above persists. Nonetheless, because the pattern dimension will increase, all estimators are nearer to the 5% rejection fee.

Desk 2: Steady covariates: 5% rejection charges for various degree of heteroskedasticity

Simulation outcomes for (N=1000) and a couple of,000 replications
Parameter VCE (gamma=0.5) (gamma=1.0) (gamma=1.5) (gamma=2.0)
(beta_1) HC1 0.087 0.104 0.108 0.105
HC2 0.076 0.084 0.083 0.085
HC3 0.066 0.070 0.065 0.066
WB 0.052 0.044 0.044 0.036
(beta_2) HC1 0.087 0.094 0.099 0.097
HC2 0.078 0.075 0.078 0.072
HC3 0.064 0.064 0.061 0.052
WB 0.048 0.045 0.031 0.031

Desk 3: Steady covariates: 5% rejection charges for various ranges of heteroskedasticity

Simulation outcomes for (N=5000) and a couple of,000 replications
Parameter VCE (gamma=0.5) (gamma=1.0) (gamma=1.5) (gamma=2.0)
(beta_1) HC1 0.076 0.062 0.065 0.061
HC2 0.072 0.051 0.057 0.053
HC3 0.069 0.044 0.053 0.048
WB 0.061 0.044 0.044 0.039
(beta_2) HC1 0.073 0.062 0.070 0.061
HC2 0.070 0.058 0.062 0.056
HC3 0.066 0.051 0.060 0.050
WB 0.057 0.044 0.050 0.043

Discrete covariates

For (beta_3) and (beta_5) and (N=100), the next is true. HC1 is the closest to the 5% rejection fee. HC2 is near the 5% rejection fee when heteroskedasticity is just not excessive. When heteroskedasticity is excessive, HC2 has rejection charges which might be under the 5% fee. HC3 and the WB have 5% rejection charges which might be smaller than 0.05. The charges turn out to be smaller the bigger the heteroskedasticity. The charges of HC3 and the wild boostrap are at all times under these of HC2.

For (beta_4) and (beta_6) and (N=100), the next is true. HC1 and HC2 have 5% rejection charges which might be greater than 0.05 for low ranges of heteroskedasticity. HC3 is near the perfect fee in these circumstances. When heteroskedasticity is excessive, the conduct of HC1 stays, HC2 will get nearer to the perfect fee, and HC3 begins to supply charges under 0.05. The WB will at all times produce charges under all different estimators.

When (N=1000), all estimators are near the perfect rejection fee when heteroskedasticity is lower than very excessive. When heteroskedasticity may be very excessive, HC1 is nearer to the optimum rejection fee. When (N=5000), all estimators are near the perfect rejection fee besides HC3, which has rejection charges under 0.05 for very excessive ranges of heteroskedasticity.

Desk 4 under presents the simulation outcomes for the 4 VCE estimators for various ranges of heteroskedasticity when the pattern dimension is (N=100). Tables 5 and 6 present outcomes for (N=1000) and (N=5000).
Desk 4: Discrete covariates: 5% rejection charges for various ranges of heteroskedasticity

Simulation outcomes for (N=100) and a couple of,000 replications
Parameter VCE (gamma=0.5) (gamma=1.0) (gamma=1.5) (gamma=2.0)
(beta_3) HC1 0.054 0.052 0.051 0.047
HC2 0.053 0.050 0.044 0.034
HC3 0.046 0.038 0.026 0.022
WB 0.032 0.032 0.030 0.027
(beta_4) HC1 0.084 0.082 0.076 0.068
HC2 0.072 0.071 0.063 0.049
HC3 0.058 0.053 0.042 0.025
WB 0.040 0.039 0.031 0.025
(beta_5) HC1 0.049 0.050 0.046 0.048
HC2 0.047 0.045 0.037 0.035
HC3 0.036 0.035 0.028 0.019
WB 0.033 0.033 0.027 0.028
(beta_6) HC1 0.081 0.078 0.068 0.061
HC2 0.069 0.066 0.059 0.045
HC3 0.050 0.047 0.037 0.027
WB 0.037 0.033 0.024 0.020

Desk 5: Discrete covariates: 5% rejection charges for various ranges of heteroskedasticity

Simulation outcomes for (N=1000) and a couple of,000 replications
Parameter VCE (gamma=0.5) (gamma=1.0) (gamma=1.5) (gamma=2.0)
(beta_3) HC1 0.047 0.053 0.053 0.040
HC2 0.047 0.051 0.049 0.032
HC3 0.045 0.050 0.044 0.027
WB 0.043 0.052 0.049 0.037
(beta_4) HC1 0.051 0.054 0.056 0.040
HC2 0.051 0.051 0.049 0.032
HC3 0.049 0.046 0.045 0.029
WB 0.050 0.047 0.050 0.036
(beta_5) HC1 0.044 0.054 0.051 0.054
HC2 0.044 0.053 0.048 0.046
HC3 0.042 0.050 0.045 0.039
WB 0.043 0.053 0.049 0.048
(beta_6) HC1 0.053 0.057 0.051 0.049
HC2 0.052 0.054 0.048 0.043
HC3 0.050 0.052 0.042 0.038
WB 0.047 0.052 0.046 0.041

Desk 6: Discrete covariates: 5% rejection charges for various ranges of heteroskedasticity

Simulation outcomes for (N=5000) and a couple of,000 replications
Parameter VCE (gamma=0.5) (gamma=1.0) (gamma=1.5) (gamma=2.0)
(beta_3) HC1 0.046 0.053 0.049 0.045
HC2 0.046 0.053 0.047 0.043
HC3 0.046 0.052 0.045 0.040
WB 0.045 0.052 0.049 0.045
(beta_4) HC1 0.058 0.054 0.048 0.048
HC2 0.058 0.054 0.047 0.044
HC3 0.057 0.053 0.045 0.039
WB 0.058 0.052 0.047 0.049
(beta_5) HC1 0.050 0.058 0.047 0.045
HC2 0.050 0.057 0.044 0.041
HC3 0.049 0.057 0.042 0.038
WB 0.048 0.055 0.046 0.043
(beta_6) HC1 0.055 0.059 0.051 0.045
HC2 0.055 0.058 0.050 0.041
HC3 0.055 0.056 0.049 0.039
WB 0.055 0.059 0.051 0.046

Lengthy and Erwin-type simulations

I once more conduct simulations for 3 pattern sizes. As in Lengthy and Erwin (2000), I enable correlation between covariates and embody each steady and categorical covariates. The error time period is just not regular, and I enable for a excessive degree of heteroskedasticity all through. As an alternative of the 5 parameters of Lengthy and Erwin (2000), I deal with six.

When the pattern dimension is (N=100), the common worth of the utmost leverage is roughly 0.24 and will attain 0.46 for some attracts. That is much less extreme than within the MacKinnon and White-type simulations however nonetheless generates rejection charges above 0.05 for the HCk estimators. When the pattern dimension is (N=1000), the common most leverage is of roughly 0.042 and is at its most round 0.11. When (N=5000), the utmost leverage is at all times under 0.04.

I arrive at the same conclusion for the Lengthy and Erwin-type simulations that I did for the MacKinnon and White-type simulations within the earlier part. HC3 is finest when approximating the perfect rejection fee for steady covariates, (beta_1) and (beta_2), however has rejection charges which might be low for the discrete covariates. For the discrete covariates, HC1 is closest to the perfect rejection fee however has excessive rejection charges for steady covariates. HC2 is best than HC1 for the continual covariates however worse for the discrete covariates. The WB tends to have protection charges under 0.05 and decrease than the opposite estimators.

In desk 7 under, we current the rejection charges for all covariates and pattern sizes.

Desk 7: 5% rejection charges for 2 pattern sizes

Parameter VCE (N=100) (N=1000) (N=5000)
(beta_1) HC1 0.099 0.054 0.053
HC2 0.082 0.051 0.052
HC3 0.064 0.050 0.052
WB 0.035 0.047 0.055
(beta_2) HC1 0.089 0.052 0.042
HC2 0.073 0.050 0.042
HC3 0.056 0.048 0.042
WB 0.043 0.051 0.044
(beta_3) HC1 0.046 0.046 0.050
HC2 0.045 0.044 0.049
HC3 0.033 0.044 0.049
WB 0.026 0.047 0.052
(beta_4) HC1 0.031 0.044 0.050
HC2 0.024 0.044 0.050
HC3 0.014 0.040 0.049
WB 0.011 0.046 0.051
(beta_5) HC1 0.047 0.063 0.057
HC2 0.038 0.061 0.057
HC3 0.025 0.060 0.057
WB 0.013 0.063 0.061
(beta_6) HC1 0.059 0.060 0.061
HC2 0.045 0.059 0.060
HC3 0.030 0.057 0.060
WB 0.023 0.062 0.060

Angrist and Pischke-type simulations

I mimic the Angrist and Pischke (2009) simulations, however as a substitute of permitting a pattern dimension of 30, I enable 3 completely different pattern sizes, (N=100), (N=300), and (N=1000). All outcomes are in desk 8 under. Right here I’m making an attempt to get well one parameter for a binary regressor. When I’ve 100 observations, the protection fee for all estimators is above 0.05 aside from the WB, which is under. The imply of the utmost leverage is roughly 0.11 and at its largest is 0.5. When the pattern dimension is (N=300) and (N=1000), all estimators are near the 0.05 rejection fee. Under are the simulation outcomes.

Desk 8: 5% rejection charges for 3 pattern sizes

Parameter VCE (N=100) (N=300) (N=1000)
(beta_1) HC1 0.099 0.055 0.055
HC2 0.082 0.052 0.054
HC3 0.066 0.048 0.053
WB 0.030 0.040 0.050

Conclusion
From the literature and my simulations, I conclude that crucial consideration when utilizing heteroskedasticity-consistent customary errors is to have many observations for every parameter (regressor) you wish to estimate. Additionally, at any time when you might be involved in regards to the validity of your customary errors, you need to take a look at the leverage factors implied by the fitted mannequin. Leverage factors near 1 needs to be motive for concern. Simulations present that very excessive leverage factors yield VCE estimators that aren’t near the perfect rejection charges.

References

Angrist, J. D., and J.-S. Pischke. 2009. Principally Innocent Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton College Press.


Cattaneo, M. D., M. Jansson, and W. Ok. Newey. 2018. Inference in linear regression fashions with many covariates and heteroscedasticity. Journal of the American Statistical Affiliation 113: 1350–1361.
https://doi.org/10.1080/01621459.2017.1328360.


Chesher, A., and I. Jewitt. 1987. The bias of a heteroskedasticity constant covariance matrix estimator. Econometrica 55: 1217–1222.
https://doi.org/10.2307/1911269.


Chesher, A., and G. Austin. 1991. The finite-sample distributions of heteroskedasticity strong Wald statistics. Journal of Econometrics 47: 153–173.
https://doi.org/10.1016/0304-4076(91)90082-O.


Lengthy, J. S., and L. H. Ervin. 2000. Utilizing heteroscedasticity constant customary errors within the linear regression mannequin. American Statistician 54: 217–224.
https://doi.org/10.2307/2685594.


MacKinnon, J. G. 2012. Thirty years of heteroscedasticity-robust inference. In Latest Advances and Future Instructions in Causality, Prediction, and Specification Evaluation, ed. X. Chen, and N. R. Swanson, 437–461. New York: Springer.
https://doi.org/10.1007/978-1-4614-1653-1_17.


MacKinnon, J., and H. White. 1985. Some heteroskedasticity-consistent covariance matrix estimators with improved finite pattern properties. Journal of Econometrics 29: 305–325.
https://doi.org/10.1016/0304-4076(85)90158-7.


White, H. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct take a look at for heteroskedasticity. Econometrica 48: 817–838.
https://doi.org/10.2307/1912934

Appendix: Do-files and simulations

For the MacKinnon-type simulations, there’s a file for every pattern dimension and degree of heteroskedasticity. There are a lot of methods of working simulations utilizing all of those information. I present each in order that these wanting to make use of them can resolve which manner is finest.

For the pattern dimension (N=100), for instance, the information are named

gamma_05_100.do
gamma_1_100.do
gamma_15_100.do
gamma_20_100.do

The quantity after the primary underscore refers back to the degree of heteroskedasticity. The quantity after the second underscore refers back to the pattern dimension.

For the Lengthy and Erwin-type simulations. I’ve

long_100.do
long_1000.do
long_5000.do

The quantity after the primary underscore refers back to the pattern dimension.

For the Angrist and Pischke-type simulations, the naming conference is similar as for the Lengthy and Erwin case.

harmless_100.do
harmless_300.do
harmless_1000.do



Related Articles

Latest Articles