Utilizing gmm to unravel two-step estimation issues

March 19, 2026

95

Two-step estimation issues may be solved utilizing the gmm command.

When a two-step estimator produces constant level estimates however inconsistent commonplace errors, it is called the two-step-estimation downside. For example, inverse-probability weighted (IPW) estimators are a weighted common wherein the weights are estimated in step one. Two-step estimators use first-step estimates to estimate the parameters of curiosity in a second step. The 2-step-estimation downside arises as a result of the second step ignores the estimation error in step one.

One resolution is to transform the two-step estimator right into a one-step estimator. My favourite method to do that conversion is to stack the equations solved by every of the 2 estimators and clear up them collectively. This one-step strategy produces constant level estimates and constant commonplace errors. There isn’t any two-step downside as a result of all of the computations are carried out collectively. Newey (1984) derives and justifies this strategy.

I’m going as an example this strategy with the IPW instance, however it may be used with any two-step downside so long as every step is steady.

IPW estimators are steadily used to estimate the imply that may be noticed if everybody in a inhabitants acquired a specified remedy, a amount often known as a potential-outcome imply (POM). A distinction of POMs is named the common remedy impact (ATE). Other than all that, it’s the mechanics of the two-step IPW estimator that curiosity me right here. IPW estimators are weighted averages of the end result, and the weights are estimated in a primary step. The weights used within the second step are the inverse of the estimated chance of remedy.

Let’s think about we’re analyzing an extract of the birthweight knowledge utilized by Cattaneo (2010). On this dataset, bweight is the newborn’s weight at delivery, mbsmoke is 1 if the mom smoked whereas pregnant (and 0 in any other case), mmarried is 1 if the mom is married, and prenatal1 is 1 if the mom had a prenatal go to within the first trimester.

Let’s think about we wish to estimate the imply when all pregnant girls smoked, which is to say, the POM for smoking. If we have been doing substantive analysis, we’d additionally estimate the POM when no pregnant girls smoked. The distinction between these estimated POMs would then estimate the ATE of smoking.

Within the IPW estimator, we start by estimating the chance weights for smoking. We match a probit mannequin of mbsmoke as a operate of mmarried and prenatal1.


. use cattaneo2
(Excerpt from Cattaneo (2010) Journal of Econometrics 155: 138-154)

. probit mbsmoke mmarried prenatal1, vce(sturdy)

Iteration 0:   log pseudolikelihood = -2230.7484
Iteration 1:   log pseudolikelihood = -2102.6994
Iteration 2:   log pseudolikelihood = -2102.1437
Iteration 3:   log pseudolikelihood = -2102.1436

Probit regression                                 Variety of obs   =       4642
                                                  Wald chi2(2)    =     259.42
                                                  Prob > chi2     =     0.0000
Log pseudolikelihood = -2102.1436                 Pseudo R2       =     0.0577

------------------------------------------------------------------------------
             |               Sturdy
     mbsmoke |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    mmarried |  -.6365472   .0478037   -13.32   0.000    -.7302407   -.5428537
   prenatal1 |  -.2144569   .0547583    -3.92   0.000    -.3217811   -.1071327
       _cons |  -.3226297   .0471906    -6.84   0.000    -.4151215   -.2301379
------------------------------------------------------------------------------

The outcomes point out that each mmarried and prenatal1 considerably predict whether or not the mom smoked whereas pregnant.

We wish to calculate the inverse chances. We start by getting the chances:


. predict double pr, pr

Now, we will acquire the inverse chances by typing


. generate double ipw = (mbsmoke==1)/pr

We are able to now carry out the second step: calculate the imply for people who smoke by utilizing the IPWs.


. imply bweight [pw=ipw]

Imply estimation                     Variety of obs    =     864

--------------------------------------------------------------
             |       Imply   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
     bweight |   3162.868   21.71397      3120.249    3205.486
--------------------------------------------------------------
. imply bweight [pw=ipw] if mbsmoke

The purpose estimate reported by imply is constant; the reported commonplace error is just not. It isn’t as a result of imply takes the weights as mounted after they have been actually estimated.

The stacked two-step—utilizing gmm to unravel the two-step-estimation downside—as a substitute creates a one-step estimator that solves each steps concurrently.

To do this, we have now to seek out after which code the second situations.

So what are the second situations for the first-step maximum-likelihood probit? Most probability (ML) estimators acquire their parameter estimates by discovering the parameters that set the technique of the primary derivatives with respect to every parameter to 0. The technique of the primary derivatives are the moments.

The second situations are that the technique of the primary derivatives equal 0. We are able to acquire these first derivatives for ourselves, or we will copy them from the Strategies and formulation part of [R] probit:

[
1/Nsum_{i=1}^Nfrac{ phi({bf x}_iboldsymbol{beta}’)
left{d_i-Thetaleft({bf
x}_iboldsymbol{beta}’right)right}}{Thetaleft({bf
x}_iboldsymbol{beta}’right)
left{1-Thetaleft({bf x}_iboldsymbol{beta}’right)right}}{bf x}_i’ = {bf 0}
]

the place (phi()) is the density operate of the usual regular distribution, (d_i) is the binary variable that’s 1 for handled people (and 0 in any other case), and (Theta()) is the cumulative chance operate of the usual regular.

What’s the purpose of those second situations? We’re going to use the generalized technique of moments (GMM) to unravel for the ML probit estimates. GMM is an estimation framework that defines estimators that clear up second situations. The GMM estimator that units the imply of the primary derivatives of the ML probit to 0 produces the identical level estimates because the ML probit estimator.

Stata’s GMM estimator is the gmm command; see [R] gmm for an introduction.

The construction of those second situations drastically simplifies the issue. For every remark, the left-hand aspect is the product of a scalar subexpression, particularly,

[
frac{phi({bf x}_iboldsymbol{beta}’){d_i-Theta({bf
x}_iboldsymbol{beta}’)}}
{Theta({bf x}_iboldsymbol{beta}’){1-Theta({bf
x}_iboldsymbol{beta}’)}}
]

and the covariates ({bf x}_i). In GMM parlance, the variables that multiply the scalar expression are referred to as devices.

The gmm command that may clear up these second situations is


. generate double cons = 1

. gmm (normalden({xb:mmarried prenatal1 cons})*(mbsmoke - regular({xb:}))/ ///
>         (regular({xb:})*(1-normal({xb:})) )),                            ///
>         devices(mmarried prenatal1 )  winitial(identification) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  .61413428
Iteration 1:   GMM criterion Q(b) =  .00153235
Iteration 2:   GMM criterion Q(b) =  1.652e-06
Iteration 3:   GMM criterion Q(b) =  1.217e-12
Iteration 4:   GMM criterion Q(b) =  7.162e-25

GMM estimation

Variety of parameters =   3
Variety of moments    =   3
Preliminary weight matrix: Id                       Variety of obs  =    4642

------------------------------------------------------------------------------
             |               Sturdy
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
/xb_mmarried |  -.6365472   .0477985   -13.32   0.000    -.7302306   -.5428638
/xb_prenat~1 |  -.2144569   .0547524    -3.92   0.000    -.3217696   -.1071442
    /xb_cons |  -.3226297   .0471855    -6.84   0.000    -.4151115   -.2301479
------------------------------------------------------------------------------
Devices for equation 1: mmarried prenatal1 _cons

With gmm, we specify in parentheses the scalar expression, and we specify the covariates within the devices() choice. The unknown parameters are the implied coefficients on the variables laid out in {xb:mmarried prenatal1 cons}. Observe that we subsequently discuss with the linear mixture as {xb:}.

The winitial(identification) and onestep choices assist the solution-finding method.

The purpose estimates and the usual errors produced by the gmm command match these reported by probit, ignoring numerical points.

Now that we will use gmm to acquire our first-step estimates, we have to add the second situation that defines the weighted common of the POM for people who smoke. The equation for the POM for people who smoke is

[
{rm POM} = 1/Nsum_{i=1}^{N}{{bf mbsmoke}_iover{Phi({bf x}_iboldsymbol{beta})}}
]

Recall that the inverse weights are (1/Phi({bf x}_iboldsymbol{beta})) for people who smoke. Once we solved this downside utilizing a two-step estimator, we carried out the second step just for people who smoke. We typed imply bweight [pw=ipw] if mbsmoke==1. We can not use if mbsmoke==1 within the gmm command as a result of step one must be carried out over all the info. As an alternative, we set the weights to 0 within the second step for the nonsmokers. Multiplying (1/Phi({bf x}_iboldsymbol{beta})) by ({bf mbsmoke}_i) does that.

Anyway, the equation for the POM for people who smoke is

[
{rm POM} = 1/Nsum_{i=1}^{N}{{bf mbsmoke}_iover{Phi({bf x}_iboldsymbol{beta})}}]

and the second situation is subsequently

[
1/Nsum_{i=1}^{N}{{bf mbsmoke}_iover{Phi({bf x}_iboldsymbol{beta})}} – {rm
POM} = 0
]

Within the gmm command under, I name the scalar expression for the probit second situations eq1, and I name the scalar expression for the POM weighted-average equation eq2. Each second situations have the scalar-expression-times-instrument construction, however the weighted-average second expression is multiplied by a relentless that’s included as an instrument by default. Within the weighted-average second situation, parameter pom is the POM we want to estimate.


. gmm (eq1: normalden({xb:mmarried prenatal1 cons})*                     ///
>         (mbsmoke - regular({xb:}))/(regular({xb:})*(1-normal({xb:})) ))  ///
>     (eq2: (mbsmoke/regular({xb:}))*(bweight - {pom})),                  ///
>     devices(eq1:mmarried prenatal1 )                               ///
>     devices(eq2: )                                                 ///
>     winitial(identification) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  1364234.7
Iteration 1:   GMM criterion Q(b) =  141803.69
Iteration 2:   GMM criterion Q(b) =  84836.523
Iteration 3:   GMM criterion Q(b) =  1073.6829
Iteration 4:   GMM criterion Q(b) =  .01215102
Iteration 5:   GMM criterion Q(b) =  1.196e-13
Iteration 6:   GMM criterion Q(b) =  2.815e-27

GMM estimation

Variety of parameters =   4
Variety of moments    =   4
Preliminary weight matrix: Id                       Variety of obs  =    4642

------------------------------------------------------------------------------
             |               Sturdy
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
/xb_mmarried |  -.6365472   .0477985   -13.32   0.000    -.7302306   -.5428638
/xb_prenat~1 |  -.2144569   .0547524    -3.92   0.000    -.3217696   -.1071442
    /xb_cons |  -.3226297   .0471855    -6.84   0.000    -.4151115   -.2301479
        /pom |   3162.868   21.65827   146.04   0.000     3120.418    3205.317
------------------------------------------------------------------------------
Devices for equation 1: mmarried prenatal1 _cons
Devices for equation 2: _cons

On this output, each the purpose estimates and the usual errors are constant!

They’re constant as a result of we transformed our two-step estimator right into a one-step estimator.

Stata has a teffects command

What we have now simply achieved is reimplement Stata’s teffects command in a selected case. Outcomes are equivalent:


. teffects ipw (bweight) (mbsmoke mmarried prenatal1, probit) , pom

Iteration 0:   EE criterion =  5.387e-22
Iteration 1:   EE criterion =  3.332e-27

Remedy-effects estimation                    Variety of obs      =      4642
Estimator      : inverse-probability weights
Final result mannequin  : weighted imply
Remedy mannequin: probit
------------------------------------------------------------------------------
             |               Sturdy
     bweight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
POmeans      |
     mbsmoke |
  nonsmoker  |   3401.441   9.528643   356.97   0.000     3382.765    3420.117
     smoker  |   3162.868   21.65827   146.04   0.000     3120.418    3205.317
------------------------------------------------------------------------------

Conclusion

To which issues are you able to apply this stacked two-step strategy?

This strategy of stacking the second situations is designed for two-step issues wherein the variety of parameters equals the variety of pattern second situations in every step. Such estimators are referred to as precisely recognized as a result of the variety of parameters is similar because the variety of equations that they clear up.

For precisely recognized estimators, the purpose estimates produced by the stacked GMM are equivalent to the purpose estimates produced by the two-step estimator. The stacked GMM, nevertheless, produces constant commonplace errors.

For estimators with extra situations than parameters, the stacked GMM additionally corrects the usual errors, however there are caveats that I’m not going to debate right here.

The stacked GMM requires that the second situations be repeatedly differentiable and fulfill commonplace regularity situations. Easy, common ML estimators and least-squares estimators meet these necessities; see Newey (1984) for particulars.

The principle sensible hurdle is getting the second situations for the estimators within the totally different steps. If the steps contain ML, these first-derivative situations may be immediately translated to second situations. The calculus half is labored out in lots of textbooks, and generally even within the Stata manuals.

See [R] gmm for extra info on methods to use the gmm command.

References

Cattaneo, M. D. 2010. Environment friendly semiparametric estimation of multi-valued remedy results below ignorability. Journal of Econometrics 155: 138–154.

Newey, W. Okay. 1984. A way of moments interpretation of sequential estimators. Economics Letters 14: 201–206.

Utilizing gmm to unravel two-step estimation issues

Stata has a teffects command

Conclusion

References

Related Articles

Which Is Higher for Challenge Monitoring?

Construct a Credit score Scoring Grid From a Logistic Regression Mannequin

8BitDo’s button-only arcade controller will get a tiny display screen

Latest Articles

Which Is Higher for Challenge Monitoring?

Construct a Credit score Scoring Grid From a Logistic Regression Mannequin

8BitDo’s button-only arcade controller will get a tiny display screen

High quantum pc professional claims Microsoft’s ‘topological qubit’ doesn’t maintain up

Why open infrastructure will outline the AI period