Two faces of misspecification in most probability: Heteroskedasticity and strong customary errors

January 16, 2026

40

For a nonlinear mannequin with heteroskedasticity, a most probability estimator provides deceptive inference and inconsistent marginal impact estimates until I mannequin the variance. Utilizing a strong estimate of the variance–covariance matrix is not going to assist me get hold of right inference.

This differs from the instinct we acquire from linear regression. The estimates of the marginal results in linear regression are constant beneath heteroskedasticity and utilizing strong customary errors yields right inference.

If strong customary errors don’t resolve the issues related to heteroskedasticity for a nonlinear mannequin estimated utilizing most probability, what does it imply to make use of strong customary errors on this context? I reply this query utilizing simulations and illustrate the impact of heteroskedasticity in nonlinear fashions estimated utilizing most probability.

What occurs when I’ve heteroskedasticity

Suppose that the true mannequin is a heteroskedastic probit the place

start{equation*}
y = left{
start{array}{cl}
1 & textual content{if} quad xbeta + varepsilon > 0
0 & textual content{in any other case}
finish{array}proper.
finish{equation*}

start{equation*}
varepsilon | x sim Nleft(0, expleft(xgammaright) proper)
finish{equation*}

This mannequin is heteroskedastic as a result of the variance of the unobserved element, (varepsilon), is a operate of the covariates. In distinction, the variance of the probit mannequin doesn’t rely upon the covariates; it is the same as 1.

In desk 1 under, I present the typical of the change within the end result when a steady covariate adjustments, the typical marginal impact (AME), the typical of the change within the end result when a discrete variable varies from a base stage, which I discuss with as the typical therapy impact (ATE), and the 5% rejection charge of a take a look at in opposition to the true null speculation. I evaluate two estimators, a probit with a strong variance–covariance matrix and a heteroskedastic probit. In desk 1, I additionally present an approximate true worth of the AME and ATE. I get hold of the approximate true values by computing the ATE and AME, on the true values of the coefficients, utilizing a pattern of 10 million observations. I present extra particulars about estimation and the the simulation within the appendix.

Desk 1. Common marginal and therapy results: True DGP heteroskedastic probit
Simulation outcomes for N=10,000 and a pair of,000 replications

Statistic	Approximate True Worth	Probit	Hetprobit
AME of x1	-.210	-.099	-.210
5% Rejection Price		1.00	.052
AME of x3	.166	.061	.166
5% Rejection Price		1.00	.062
ATE of x2 (1 vs 0)	-.190	-.193	-.191
5% Rejection Price		.060	.064
ATE of x2 (2 vs 0)	.082	.077	.081
5% Rejection Price		.061	.065
ATE of x2 (3 vs 0)	-.190	-.192	-.191
5% Rejection Price		.058	.063

As anticipated, the heteroskedastic probit estimates are near the true worth, and the rejection charge of the true null speculation is shut to five%. That is additionally true for the ATE probit estimates. Nevertheless, the probit AME estimates are distant from the true worth, and the rejection charge is 100%, no matter my use of a strong variance–covariance matrix estimator.

The probit probability on this instance is misspecified. As White (1996) illustrates, the misspecified probit probability estimates converge to a well-defined parameter, and strong customary errors present right protection for this parameter. Nevertheless, the worth obtained from the probit probability, because the simulations illustrate, provides an inconsistent estimate of the consequences of curiosity. Typically, as we are going to see under, this misspecified worth is of curiosity.

If a strong variance didn’t right for heteroskedasticity, what’s it doing?

Though a strong variance–covariance matrix estimator is carefully associated to heteroskedasticity in linear regression fashions, as I present within the two examples under, a strong variance–covariance matrix estimator has a special interpretation in a nonlinear mannequin estimated utilizing most probability.

First, let’s take a look at the probit probability in one other context.

Instance 1 (Probit pseudolikelihood). Suppose the true mannequin is given by

start{eqnarray*}
y &=& Phileft(xbeta + varepsilonright)
varepsilon | x & sim & Nleft(0, 1right)
finish{eqnarray*}

The mannequin above is a fractional response mannequin. The result variable in fractional response fashions takes values which might be larger than or equal to 0 and fewer than or equal to 1. This isn’t a binary response mannequin, however we could use the probit probability to get a constant estimate of the result imply,

start{equation*}
Eleft(y|xright) = Phileft(frac{xbeta}{sqrt{2}}proper)
finish{equation*}

Beneath I simulate and acquire estimates for the mannequin above:


clear
set seed 222
set obs 1000000
generate x1   = rnormal()
generate x2   = int(rbeta(3,2)*3)
generate xb   = .5*(1 + x1 -1.x2 + 2.x2)
generate e    = rnormal()
generate yp   = regular(xb + e)

For these information, the AMEs and ATEs are given by


// Marginal impact x1
generate m1   = normalden(xb/sqrt(2))*(.5/sqrt(2))
// Therapy impact of x2 1 vs 0
generate m21  = regular(.5*(x1)/sqrt(2)) - regular(.5*(x1 + 1)/sqrt(2))
// Therapy impact of x2 2 vs 0
generate m22  = regular(.5*(x1 + 2)/sqrt(2)) - regular(.5*(x1 + 1)/sqrt(2))

To suit the mannequin, I take advantage of fracreg, which employs a probit probability with a strong variance–covariance matrix by default. fracreg assumes an accurate mannequin for the imply and is agnostic about different moments of the result. As a result of we’re modeling a subset of the moments of our end result, on this instance the imply, and don’t mannequin the opposite moments, we use a strong estimator of the variance–covariance matrix to acquire constant estimates of the unknown customary errors. Provided that the probit probability is just not the true probability, we discuss with the probability as a pseudolikelihood or quasilikelihood.

The estimates for the AMEs and ATEs after fracreg are given by


. margins, dydx(*)

Common marginal results                        Variety of obs     =  1,000,000
Mannequin VCE    : Sturdy

Expression   : Conditional imply of yp, predict()
dy/dx w.r.t. : x1 1.x2 2.x2

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .1209728   .0002398   504.55   0.000     .1205028    .1214427
             |
          x2 |
          1  |  -.1304846   .0008839  -147.62   0.000    -.1322171   -.1287521
          2  |   .1175945   .0008696   135.23   0.000     .1158902    .1192988
------------------------------------------------------------------------------
Notice: dy/dx for issue ranges is the discrete change from the bottom stage.

that are near the pattern values


. summarize m*

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          m1 |  1,000,000    .1213898    .0219322   .0083771   .1410474
         m21 |  1,000,000   -.1305564    .0123099  -.1403162   -.025986
         m22 |  1,000,000    .1169432    .0206344   .0128037   .1403162

Let’s take a look at one other instance that was mentioned in http://weblog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/.

Instance 2 (Exponential imply mannequin). Suppose the true mannequin is given by

start{eqnarray*}
y &=& expleft(xbeta + varepsilonright)
varepsilon | x & sim & Nleft(0, 1right)
finish{eqnarray*}

This isn’t a Poisson mannequin, however we are able to use the Poisson probability estimates to get a constant estimate of the result imply

start{equation*}
Eleft(y|xright) = expleft(xbeta + frac{1}{2}proper)
finish{equation*}

Provided that we do not need a Poisson mannequin, our estimates shouldn’t be used to acquire statistics that aren’t capabilities of the result imply. For instance, it is senseless to foretell counts or the likelihood of the result being a particular integer, pure predictions, if the true probability was a Poisson probability.

Beneath I simulate information for the exponential imply mannequin above:


clear
set seed 222
set obs 1000000
generate x1   = rnormal()
generate x2   = int(rbeta(3,2)*3)
generate xb   = .5*(1 + x1 -1.x2 + 2.x2)
generate e    = rnormal()
generate ye   = exp(xb + e)

The estimation outcomes are given by


. poisson ye x1 i.x2, vce(strong)
word: you're liable for interpretation of noncount dep. variable

Iteration 0:   log pseudolikelihood = -2904731.1
Iteration 1:   log pseudolikelihood = -2904726.1
Iteration 2:   log pseudolikelihood = -2904726.1

Poisson regression                              Variety of obs     =  1,000,000
                                                Wald chi2(3)      =  142144.11
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -2904726.1               Pseudo R2         =     0.2087

------------------------------------------------------------------------------
             |               Sturdy
          ye |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .5006891   .0018594   269.27   0.000     .4970447    .5043335
             |
          x2 |
          1  |  -.4953304   .0049604   -99.86   0.000    -.5050527   -.4856081
          2  |   .5086742   .0050554   100.62   0.000     .4987659    .5185825
             |
       _cons |   .9956749   .0044566   223.42   0.000     .9869401     1.00441
------------------------------------------------------------------------------

The output notes that we’ve a noncount end result—in our case a steady end result with an exponential imply—and are liable for the interpretation of our outcomes. The iteration log states that we’ve a pseudolikelihood, which can at all times be said once we use a strong variance–covariance matrix with a most probability estimator.

The AMEs and ATEs for the exponential imply mannequin are given by


// Marginal impact x1
generate mex1  = exp(.5*(1 + x1 -1.x2 + 2.x2) + .5)*.5
// Therapy impact of x2 1 vs 0
generate te1   = exp(.5*(1 + x1 - 1) + .5) - exp(.5*(1 + x1) + .5)
// Therapy impact of x2 2 vs 0
generate te2   = exp(.5*(1 + x1 + 1) + .5) - exp(.5*(1 + x1) + .5)

and their estimates are given by


. quietly poisson ye x1 i.x2, vce(strong)

. margins, dydx(*) expression(exp(xb()))

Common marginal results                        Variety of obs     =  1,000,000
Mannequin VCE    : Sturdy

Expression   : exp(xb())
dy/dx w.r.t. : x1 1.x2 2.x2

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   1.661569   .0078353   212.06   0.000     1.646212    1.676926
             |
          x2 |
          1  |  -1.198624   .0142905   -83.88   0.000    -1.226633   -1.170615
          2  |   2.034632   .0182593   111.43   0.000     1.998844    2.070419
------------------------------------------------------------------------------
Notice: dy/dx for issue ranges is the discrete change from the bottom stage.

that are near the pattern values


. summarize mex1 te*

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        mex1 |  1,000,000    1.654795    1.229289   .0810196    23.7496
         te1 |  1,000,000   -1.212149    .6463085  -11.33574  -.1051182
         te2 |  1,000,000    1.998497    1.065583   .1733107   18.68948

As a result of we used a strong variance–covariance matrix, we’ve constant estimates of the usual errors of the consequences.

Concluding remarks

Utilizing simulations, I confirmed that heteroskedasticity in nonlinear fashions estimated utilizing most probability produces inconsistent estimates of marginal results. This differs from heteroskedasticity in linear regression fashions, which doesn’t have an effect on the consistency of marginal impact estimates.

One other distinction between linear regression fashions and nonlinear fashions estimated utilizing most chances are the interpretation of the strong variance–covariance matrix. Within the latter case, as I illustrated through two examples, it signifies that we’re utilizing a pseudolikelihood to mannequin a set of moments of our end result and are agnostic about all different moments.

In each instances, we’ve a misspecified probability. Within the case of heteroskedasticity, the psedudolikelihood estimates converge to an estimate that’s totally different from the consequences of curiosity. Within the case the place we mannequin the imply appropriately, the psedudolikelihood estimates converge to the consequences of curiosity. They’re two faces of the identical drawback, misspecified likelihoods in nonlinear fashions estimated utilizing most probability.

Reference

White, H. 1996. Estimation, Inference and Specification Evaluation. Cambridge: Cambridge College Press.

Appendix

This system used for the simulations of the primary instance is given by


clear all
native L = 10000000
native R = 2000
native N = 10000
set seed 222

program outline mkdata
        syntax, [n(integer 10000)]
        clear
        quietly set obs `n'
        generate x1    = rchi2(1)-1
        generate x2    = int(4*rbeta(5,2))
        generate x3    = rchi2(1)-1
        generate sg    = exp(.3*(x1 -1.x2 + 2.x2 - 3.x2 + x3))
        generate e     = rnormal(0 , sg)
        generate xb    = .5*(1 - x1 - 1.x2 + 2.x2 - 3.x2 + x3)
        generate y     =  xb + e > 0

        generate m1  = normalden(xb/sg)*((-.5 -.3*xb)/sg)
        generate m3  = normalden(xb/sg)*((.5 -.3*xb)/sg)
        generate m21 = regular(.5*(-x1 + x3)/exp(.3*(x1 -1 + x3)))       ///
                   - regular(.5*(1 -x1 + x3)/exp(.3*(x1 + x3)))
        generate m22 = regular(.5*(2-x1 + x3)/exp(.3*(x1 + 1 + x3)))     ///
                                  -normal(.5*(1 -x1 + x3)/exp(.3*(x1+ x3)))
        generate m23 = m21
finish

mkdata, n(`L')
summarize m1, meanonly
native m1  = r(imply)
summarize m3, meanonly
native m3  = r(imply)
summarize m21, meanonly
native m21 = r(imply)
summarize m22, meanonly
native m22 = r(imply)
summarize m23, meanonly
native m23 = r(imply)

show `m1'
show `m3'
show `m21'
show `m22'
show `m23'

postfile sims est hm1 hm1_r hm21 hm21_r hm22 hm22_r hm23 hm23_r hm3 hm3_r  ///
         rc cv utilizing hetprobit, change

forvalues i=1/`R' {
        quietly {
                mkdata, n(`N')
                seize probit y x1 i.x2 x3, vce(strong) iterate(200)
                native rc = _rc
                native cv = e(converged)
                if (`rc' | `cv'==0){
                        native hm1    = .
                        native hm1_r  = .
                        native hm21   = .
                        native hm21_r = .
                        native hm22   = .
                        native hm22_r = .
                        native hm23   = .
                        native hm23_r = .
                        native hm3    = .
                        native hm3_r  = .
                }
                else {
                        margins, dydx(*) submit
                        native hm1 = _b[x1]
                        take a look at _b[x1] = `m1'
                        native hm1_r   = (r(p)<.05)
                        native hm21 = _b[1.x2]
                        take a look at _b[1.x2] = `m21'
                        native hm21_r   = (r(p)<.05)
                        native hm22 = _b[2.x2]
                        take a look at _b[2.x2] = `m22'
                        native hm22_r   = (r(p)<.05)
                        native hm23 = _b[3.x2]
                        take a look at _b[3.x2] = `m23'
                        native hm23_r   = (r(p)<.05)
                        native hm3 = _b[x3]
                        take a look at _b[x3] = `m3'
                        native hm3_r   = (r(p)<.05)
                }
                submit sims (1) (`hm1') (`hm1_r') (`hm21') (`hm21_r')       ///
                          (`hm22') (`hm22_r') (`hm23') (`hm23_r') (`hm3') ///
                          (`hm3_r') (`rc') (`cv')

                seize hetprobit y x1 i.x2 x3, het(x1 i.x2 x3) iterate(200)
                native rc = _rc
                native cv = e(converged)
                if (`rc' | `cv'==0) {
                        native hm1    = .
                        native hm1_r  = .
                        native hm21   = .
                        native hm21_r = .
                        native hm22   = .
                        native hm22_r = .
                        native hm23   = .
                        native hm23_r = .
                        native hm3    = .
                        native hm3_r  = .
                }
                else {
                        margins, dydx(*) submit
                        native hm1 = _b[x1]
                        take a look at _b[x1] = `m1'
                        native hm1_r   = (r(p)<.05)
                        native hm21 = _b[1.x2]
                        take a look at _b[1.x2] = `m21'
                        native hm21_r   = (r(p)<.05)
                        native hm22 = _b[2.x2]
                        take a look at _b[2.x2] = `m22'
                        native hm22_r   = (r(p)<.05)
                        native hm23 = _b[3.x2]
                        take a look at _b[3.x2] = `m23'
                        native hm23_r   = (r(p)<.05)
                        native hm3 = _b[x3]
                        take a look at _b[x3] = `m3'
                        native hm3_r   = (r(p)<.05)
                }
                submit sims (2) (`hm1') (`hm1_r') (`hm21') (`hm21_r')       ///
                          (`hm22') (`hm22_r') (`hm23') (`hm23_r') (`hm3') ///
                          (`hm3_r') (`rc') (`cv')
        }
        if (`i'/50) == int(`i'/50) {
        di ".                 `i'"
    }
    else {
        di _c "."
    }
}
postclose sims
use hetprobit, clear
label outline est 1 "probit" 2 "hetprobit"
label values est est
bysort est: summarize

In traces 7 to 26, I create a program that defines the data-generating course of, the marginal results, and the therapy results. In traces 28 to 44, I draw a pattern of 10 million observations and take the typical of the marginal results and therapy results. As a result of the pattern measurement is massive, I take these means to be a great approximation to the true worth of the ATEs and AMEs. Traces 46 to 132 present the code used for the simulations. The final traces summarize the simulation outcomes.

Two faces of misspecification in most probability: Heteroskedasticity and strong customary errors

Related Articles

U.S.’s and Israel’s battle with Iran leaves uranium stockpiles unsure

Differential equation with a small delay

YOLOv3 Paper Walkthrough: Even Higher, However Not That A lot

Latest Articles

U.S.’s and Israel’s battle with Iran leaves uranium stockpiles unsure

Differential equation with a small delay

YOLOv3 Paper Walkthrough: Even Higher, However Not That A lot

Purchaser’s information: Evaluating the main cloud information platforms

Self-managed observability: Working agentic AI inside your boundary