Fixing lacking knowledge issues utilizing inverse-probability-weighted estimators

January 11, 2026

47

We focus on estimating population-averaged parameters when a number of the knowledge are lacking. Specifically, we present methods to use gmm to estimate population-averaged parameters for a probit mannequin when the method that causes a number of the knowledge to be lacking is a perform of observable covariates and a random course of that’s unbiased of the end result. This kind of lacking knowledge is named lacking at random, choice on observables, and exogenous pattern choice.

It is a follow-up to an earlier submit the place we estimated the parameters of a probit mannequin underneath endogenous pattern choice (http://weblog.stata.com/2015/11/05/using-mlexp-to-estimate-endogenous-treatment-effects-in-a-probit-model/). In endogenous pattern choice, the random course of that impacts which observations are lacking is correlated with an unobservable random course of that impacts the end result.

Underneath exogenous pattern choice, probit persistently estimates the regression coefficients, which decide conditional on covariate results. However estimates of the population-averaged parameters might be inconsistent when the mannequin covariates are correlated with the choice course of. To get constant estimates of the population-averaged parameters on this case, we use inverse-probability weighting to reweight the info in order that our estimates replicate the total and partially noticed observations.

This estimator makes use of the identical trick because the inverse-probability-weighted (IPW) estimators utilized in causal inference. For 2 examples of IPW estimators utilized to causal inference, see http://weblog.stata.com/2016/09/13/an-ordered-probit-inverse-probability-weighted-ipw-estimator/ and http://weblog.stata.com/2014/12/08/using-gmm-to-solve-two-step-estimation-problems/.

Exogenous pattern choice

Take into account the case the place we draw a easy random pattern from a inhabitants in some unspecified time in the future in time ((t_{1})) after which survey the identical respondents once more at a later level ((t_{2})). If we solely partly observe our variables of curiosity at (t_{2}), for instance, due to panel attrition, then population-averaged inference is just not constant if choice into (t_{2}) is a perform of the covariates used to mannequin the end result.

As an instance, let’s suppose we’ve a binary consequence variable (y_{i}) that’s solely noticed at (t_{2}) and we want to match the next probit mannequin,

[y_{i}
= 1[beta_{0} + beta_{1}d_{i} + beta_{2}z_{1,i} + beta_{3}z_{2,i} + e_{i} >
0] = 1[{bf x}_i{boldsymbol beta} + e_{i} > 0]]

the place solely (z_{1,i}) and (z_{2,i}) are noticed for the total pattern at (t_{1}) and (d_{i}) and (y_{i}) are usually not. Whereas we might persistently estimate the conditional parameters in ({boldsymbol beta}) if (z_{1,i}) or (z_{2,i}) are correlated with respondents dropping out of the pattern, we can’t persistently estimate population-averaged results from our mannequin utilizing the decreased pattern. Let’s check out a snippet from our (fictitious) dataset:


. li id s y d z1 z2 in 1/5

     +-----------------------------------------+
     | id   s   y   d           z1          z2 |
     |-----------------------------------------|
  1. |  1   1   0   0    .87519349   .53714872 |
  2. |  2   1   0   1   -.02251873   .51122735 |
  3. |  3   0   .   .     .4885629   .07667308 |
  4. |  4   0   .   .   -.95665816   .73366976 |
  5. |  5   0   .   .   -1.2078948   .32982661 |
     +-----------------------------------------+

We see that solely observations 1 and a couple of have been noticed at each occasions and observations 3, 4, and 5 dropped out. Suppose we went forward and match our mannequin to the noticed half and needed to estimate, say, marginal means for the binary variable (d_i). We use probit to estimate the parameters of the mannequin after which use margins to estimate the marginal means with the over() possibility.


. probit y i.d z1 z2

Iteration 0:   log chance = -6387.4076
Iteration 1:   log chance = -5009.8148
Iteration 2:   log chance = -5000.3354
Iteration 3:   log chance = -5000.3285
Iteration 4:   log chance = -5000.3285

Probit regression                               Variety of obs     =      9,234
                                                LR chi2(3)        =    2774.16
                                                Prob > chi2       =     0.0000
Log chance = -5000.3285                     Pseudo R2         =     0.2172

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         1.d |  -.9754357   .0296157   -32.94   0.000    -1.033481   -.9173901
          z1 |  -.6995319   .0185883   -37.63   0.000    -.7359644   -.6630995
          z2 |    .998608    .052668    18.96   0.000     .8953806    1.101835
       _cons |  -.3024672   .0359367    -8.42   0.000    -.3729019   -.2320326
------------------------------------------------------------------------------

. margins, over(d)

Predictive margins                              Variety of obs     =      9,234
Mannequin VCE    : OIM

Expression   : Pr(y), predict()
over         : d

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           d |
          0  |   .6828357    .006175   110.58   0.000     .6707329    .6949384
          1  |   .3705836   .0063825    58.06   0.000     .3580741     .383093
------------------------------------------------------------------------------

As a result of we simulated the dataset, we all know that the true inhabitants parameter for (d_{i}=0) is 0.56 and 0.26 for (d_{i}=1). Though the conditional parameters of the probit mannequin are constant, our marginal imply estimates are far off. Nonetheless, if we’ve an excellent mannequin for the pattern choice course of, and if we observe the related variables for that mannequin, we might use it to create weights that will appropriate for the pattern choice. That’s, we want a mannequin for estimating every respondent’s likelihood of choosing him- or herself into the follow-up, after which we weight the decreased pattern mannequin with the inverse of that likelihood. If we are able to mannequin pattern choice primarily based on variables we observe, we converse of exogenous pattern choice (in contrast to endogenous pattern choice the place we’ve correlated unobservables throughout the choice and consequence mannequin). For particulars about IPW sample-selection fashions, see Wooldridge (2010), part 19.8. Additionally discover that the essential thought right here is identical as that for IPW treatment-effects estimators; see http://weblog.stata.com/2016/09/13/an-ordered-probit-inverse-probability-weighted-ipw-estimator/ and http://weblog.stata.com/2015/07/07/introduction-to-treatment-effects-in-stata-part-1/.

Whereas this sounds simple, correct estimation of our marginal means requires some care as a result of we’ve a multistage estimation drawback: we’ve a variety mannequin, an consequence mannequin, and we need to estimate marginal means. A method of fixing this drawback is to estimate the whole lot concurrently through the use of a generalized method-of-moments estimator (http://weblog.stata.com/2014/12/08/using-gmm-to-solve-two-step-estimation-problems/).

Mannequin and estimator

Modeling the pattern choice utilizing a probit mannequin with (s_i) being the choice indicator, we’ve

[s_{i} = 1[gamma_{0} + gamma_{1}z_{1,i} + gamma_{2}z_{2,i} +
gamma_{3}z_{3,i} + u_{i} > 0] = 1[{bf z}_i{boldsymbol gamma} + u_i > 0]]

The conditional likelihood of choice is

[
P(s_i=1 vert {boldsymbol z}_i) = Phi({bf z}_i{boldsymbol gamma})
]

We will use the inverse of this likelihood as a weight in estimating the mannequin parameters and population-averaged parameters utilizing the absolutely noticed pattern. Intuitively, utilizing the inverse-probability weight will appropriate the estimate to replicate each the absolutely and partially noticed observations.

For the expectations of curiosity, we’ve

start{eqnarray*} E(y_ivert d_i) &=&
Eleft{s_i{Phi({bf z}_i{boldsymbol gamma})}^{-1} E(y_i|d_i,{bf z}_i)
Huge{vert} d_iright} cr &=& Eleft{s_i{Phi({bf z}_i{boldsymbol
gamma})}^{-1} Phi({bf x}_i{boldsymbol beta})Huge{vert} d_iright}
finish{eqnarray*}

We’ll use the inverse-probability weight in second situations as we estimate the mannequin parameters and marginal means utilizing the generalized technique of moments. As a result of we use a probit mannequin for each the end result and choice mannequin, we are able to use the identical second situations for each, besides that we’ve totally different samples (the total and decreased pattern). Right here we use the first-order derivatives of the probit log-likelihood perform to retain most chance estimates. For the choice mannequin, we’ve pattern second situations

[
sum_{i=1}^{N} Bigg[ Bigg{ s_{i} frac{phi({bf z}_{i} boldsymbol{gamma})}{Phi({bf z_{i}}boldsymbol{gamma})} – (1-s_{i})
frac{phi({bf z}_{i}boldsymbol{gamma})}{Phi(-{bf z}_{i}boldsymbol{gamma})} Bigg} {bf z}_{i} Bigg] = 0
]

Let (S) be the indices for the absolutely noticed pattern. For the end result mannequin we’ve pattern second situations

[
sum_{iin S} Phi({bf z}_{i}boldsymbol{gamma})^{-1} Bigg[ Bigg{ y_{i} frac{phi({bf x}_{i}boldsymbol{beta})}{Phi({bf x}_{i}boldsymbol{beta})} – (1-y_{i})
frac{phi{(bf x}_{i}boldsymbol{beta})}{Phi(-{bf x}_{i}boldsymbol{beta})} Bigg} {bf x}_{i} Bigg] = 0
]

Lastly, the pattern second situations of our marginal parameters are

[
sum_{iin S} Phi({bf z}_{i}{boldsymbol gamma})^{-1}
Bigg[ Bigg{Phi({bf x}_{i}{boldsymbol beta}) – mu_{0} Bigg} (1-d_{i}) Bigg] = 0
]

[
sum_{iin S} Phi({bf z}_{i}{boldsymbol gamma})^{-1}
Bigg[ Bigg{Phi({bf x}_{i}{boldsymbol beta}) – mu_{1} Bigg} d_{i} Bigg] = 0
]

Estimation

Now, we estimate our parameters with gmm, utilizing the interactive model syntax. We first match the naked probit mannequin with out marginal means:


. gmm (eq1: (y*normalden({xb : i.d z1 z2 _cons})/regular({xb:})-   
>       (1-y)*normalden(-{xb:})/regular(-{xb:}))),                 
>     devices(eq1: i.d z1 z2)                                 
>     winitial(unadjusted, unbiased) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =   .1679307
Iteration 1:   GMM criterion Q(b) =  .00078746
Iteration 2:   GMM criterion Q(b) =  4.901e-07
Iteration 3:   GMM criterion Q(b) =  3.725e-13
Iteration 4:   GMM criterion Q(b) =  2.405e-25

word: mannequin is strictly recognized

GMM estimation

Variety of parameters =   4
Variety of moments    =   4
Preliminary weight matrix: Unadjusted                 Variety of obs   =      9,234

------------------------------------------------------------------------------
             |               Sturdy
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         1.d |  -.9754357   .0297148   -32.83   0.000    -1.033676   -.9171958
          z1 |  -.6995319   .0184779   -37.86   0.000    -.7357479   -.6633159
          z2 |    .998608    .052408    19.05   0.000     .8958901    1.101326
       _cons |  -.3024672   .0355903    -8.50   0.000    -.3722229   -.2327115
------------------------------------------------------------------------------
Devices for equation eq1: 0b.d 1.d z1 z2 _cons

The true inhabitants parameters in ({boldsymbol beta}) are (beta_{0}=-0.3), (beta_{1}=-1), (beta_{2}=-0.7), and (beta_{3}=1), and we are able to see that our estimates get very shut to those. These outcomes are equal to what we estimated earlier than utilizing probit. Now, we match the identical probit mannequin however add the marginal means:


. gmm (eq1: (y*normalden({xb : i.d z1 z2 _cons})/regular({xb:})-   
>       (1-y)*normalden(-{xb:})/regular(-{xb:})))                  
>     (eq2: (1-d)*(regular({xb:})-{mu0}) )                         
>     (eq3:     d*(regular({xb:})-{mu1}) ),                        
>     devices(eq1: i.d z1 z2)                                 
>     winitial(unadjusted, unbiased) onestep

Step 1
Iteration 0:   GMM criterion Q(b) =  .29293128
Iteration 1:   GMM criterion Q(b) =  .00400745
Iteration 2:   GMM criterion Q(b) =  2.492e-06
Iteration 3:   GMM criterion Q(b) =  1.014e-11
Iteration 4:   GMM criterion Q(b) =  1.691e-22

word: mannequin is strictly recognized

GMM estimation

Variety of parameters =   6
Variety of moments    =   6
Preliminary weight matrix: Unadjusted                 Variety of obs   =      9,234

------------------------------------------------------------------------------
             |               Sturdy
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         1.d |  -.9754357   .0297148   -32.83   0.000    -1.033676   -.9171958
          z1 |  -.6995319   .0184779   -37.86   0.000    -.7357479   -.6633159
          z2 |    .998608    .052408    19.05   0.000     .8958901    1.101326
       _cons |  -.3024672   .0355903    -8.50   0.000    -.3722229   -.2327115
-------------+----------------------------------------------------------------
        /mu0 |   .6828357     .00683    99.98   0.000     .6694491    .6962222
        /mu1 |   .3705836   .0071107    52.12   0.000     .3566469    .3845202
------------------------------------------------------------------------------
Devices for equation eq1: 0b.d 1.d z1 z2 _cons
Devices for equation eq2: _cons
Devices for equation eq3: _cons

The parameters labeled /mu0 and /mu1 are estimates of the marginal means for (d_{i}=0) and (d_{i}=1), respectively, not accounting for pattern choice. Once more, the marginal means are the identical that we obtained earlier from margins, and these estimates are inconsistent (the true values have been 0.56 for (d_{i}=0) and 0.26 for (d_{i}=1)).

Lastly, we match our sample-selection mannequin. We specify the nocommonesample possibility as a result of we use totally different units of observations throughout second situations:


. gmm (eq1: s*normalden({zb : z1 z2 z3 _cons})/regular({zb:})-     
>       (1-s)*normalden(-{zb:})/regular(-{zb:}))                   
>     (eq2: (y*normalden({xb : i.d z1 z2 _cons})/regular({xb:})-   
>       (1-y)*normalden(-{xb:})/regular(-{xb:}))/(regular({zb:})))  
>     (eq3: (1-d)*(regular({xb:})-{mu0}) / regular({zb:}))          
>     (eq4:     d*(regular({xb:})-{mu1}) / regular({zb:})),         
>     devices(eq1: z1 z2 z3)                                  
>     devices(eq2: i.d z1 z2)                                 
>     winitial(unadjusted, unbiased)                           
>     onestep nocommonesample

Step 1
Iteration 0:   GMM criterion Q(b) =  .57729426
Iteration 1:   GMM criterion Q(b) =  .03006396
Iteration 2:   GMM criterion Q(b) =  .00014618
Iteration 3:   GMM criterion Q(b) =  2.016e-08
Iteration 4:   GMM criterion Q(b) =  1.477e-16

word: mannequin is strictly recognized

GMM estimation

Variety of parameters =  10
Variety of moments    =  10
Preliminary weight matrix: Unadjusted                 Variety of obs   =   *

------------------------------------------------------------------------------
             |               Sturdy
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
zb           |
          z1 |   -.706703   .0115024   -61.44   0.000    -.7292473   -.6841587
          z2 |   .9562022   .0347235    27.54   0.000     .8881454    1.024259
          z3 |  -.9735917   .0345574   -28.17   0.000    -1.041323   -.9058605
       _cons |   -.112247   .0257628    -4.36   0.000    -.1627412   -.0617528
-------------+----------------------------------------------------------------
xb           |
         1.d |  -.9810332    .038836   -25.26   0.000     -1.05715   -.9049161
          z1 |  -.6651958   .0435223   -15.28   0.000    -.7504979   -.5798937
          z2 |   .9877247   .0986749    10.01   0.000     .7943255    1.181124
       _cons |  -.2775556   .0726005    -3.82   0.000    -.4198501   -.1352612
-------------+----------------------------------------------------------------
        /mu0 |   .5628197    .011652    48.30   0.000     .5399823    .5856572
        /mu1 |   .2694232    .007469    36.07   0.000     .2547843    .2840621
------------------------------------------------------------------------------
* Variety of observations for equation eq1: 20000
  Variety of observations for equation eq2: 9234
  Variety of observations for equation eq3: 9234
  Variety of observations for equation eq4: 9234
------------------------------------------------------------------------------
Devices for equation eq1: z1 z2 z3 _cons
Devices for equation eq2: 0b.d 1.d z1 z2 _cons
Devices for equation eq3: _cons
Devices for equation eq4: _cons

We will see that our estimates of the marginal means are actually near the true values.

Conclusion

We demonstrated methods to use gmm to estimate population-averaged parameters with an IPW estimator. This solves a lacking knowledge drawback arising from an exogenous sample-selection course of.

Appendix

Right here is the code that we used for producing the dataset:


drop _all
set seed 123
qui set obs 20000
generate double d  = runiform() > .5
generate double z1 = rnormal()
generate double z2 = runiform()
generate double z3 = runiform()
generate double u  = rnormal()
generate double e  = rnormal()
generate double zb = (-0.1 - 0.7*z1 + z2 - z3)
generate double xb = (-0.3 - d - 0.7*z1 + z2)
generate double s  = (zb + u) > 0
generate double y  = (xb + e) > 0
qui substitute y = . if s == 0
qui substitute d = . if s == 0
generate id = _n

Reference

Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Information. 2nd ed. Cambridge, MA: MIT Press.

Fixing lacking knowledge issues utilizing inverse-probability-weighted estimators

Related Articles

A number of Brokers Auditing Your Callaway and Sant’Anna Diff-in-Diff (Half 2)

But One other Solution to Middle an (Absolute) Aspect

Switching Inference Suppliers With out Downtime

Latest Articles

A number of Brokers Auditing Your Callaway and Sant’Anna Diff-in-Diff (Half 2)

But One other Solution to Middle an (Absolute) Aspect

Switching Inference Suppliers With out Downtime

Nothing confirms Headphone (a) launch with daring yellow design and lower cost

These Offers Can Have You Zipping Round on a New E-Scooter This Spring