We focus on estimating population-averaged parameters when a number of the knowledge are lacking. Specifically, we present methods to use gmm to estimate population-averaged parameters for a probit mannequin when the method that causes a number of the knowledge to be lacking is a perform of observable covariates and a random course of that’s unbiased of the end result. This kind of lacking knowledge is named lacking at random, choice on observables, and exogenous pattern choice.
It is a follow-up to an earlier submit the place we estimated the parameters of a probit mannequin underneath endogenous pattern choice (http://weblog.stata.com/2015/11/05/using-mlexp-to-estimate-endogenous-treatment-effects-in-a-probit-model/). In endogenous pattern choice, the random course of that impacts which observations are lacking is correlated with an unobservable random course of that impacts the end result.
Underneath exogenous pattern choice, probit persistently estimates the regression coefficients, which decide conditional on covariate results. However estimates of the population-averaged parameters might be inconsistent when the mannequin covariates are correlated with the choice course of. To get constant estimates of the population-averaged parameters on this case, we use inverse-probability weighting to reweight the info in order that our estimates replicate the total and partially noticed observations.
This estimator makes use of the identical trick because the inverse-probability-weighted (IPW) estimators utilized in causal inference. For 2 examples of IPW estimators utilized to causal inference, see http://weblog.stata.com/2016/09/13/an-ordered-probit-inverse-probability-weighted-ipw-estimator/ and http://weblog.stata.com/2014/12/08/using-gmm-to-solve-two-step-estimation-problems/.
Exogenous pattern choice
Take into account the case the place we draw a easy random pattern from a inhabitants in some unspecified time in the future in time ((t_{1})) after which survey the identical respondents once more at a later level ((t_{2})). If we solely partly observe our variables of curiosity at (t_{2}), for instance, due to panel attrition, then population-averaged inference is just not constant if choice into (t_{2}) is a perform of the covariates used to mannequin the end result.
As an instance, let’s suppose we’ve a binary consequence variable (y_{i}) that’s solely noticed at (t_{2}) and we want to match the next probit mannequin,
[y_{i}
= 1[beta_{0} + beta_{1}d_{i} + beta_{2}z_{1,i} + beta_{3}z_{2,i} + e_{i} >
0] = 1[{bf x}_i{boldsymbol beta} + e_{i} > 0]]
the place solely (z_{1,i}) and (z_{2,i}) are noticed for the total pattern at (t_{1}) and (d_{i}) and (y_{i}) are usually not. Whereas we might persistently estimate the conditional parameters in ({boldsymbol beta}) if (z_{1,i}) or (z_{2,i}) are correlated with respondents dropping out of the pattern, we can’t persistently estimate population-averaged results from our mannequin utilizing the decreased pattern. Let’s check out a snippet from our (fictitious) dataset:
. li id s y d z1 z2 in 1/5
+-----------------------------------------+
| id s y d z1 z2 |
|-----------------------------------------|
1. | 1 1 0 0 .87519349 .53714872 |
2. | 2 1 0 1 -.02251873 .51122735 |
3. | 3 0 . . .4885629 .07667308 |
4. | 4 0 . . -.95665816 .73366976 |
5. | 5 0 . . -1.2078948 .32982661 |
+-----------------------------------------+
We see that solely observations 1 and a couple of have been noticed at each occasions and observations 3, 4, and 5 dropped out. Suppose we went forward and match our mannequin to the noticed half and needed to estimate, say, marginal means for the binary variable (d_i). We use probit to estimate the parameters of the mannequin after which use margins to estimate the marginal means with the over() possibility.
. probit y i.d z1 z2
Iteration 0: log chance = -6387.4076
Iteration 1: log chance = -5009.8148
Iteration 2: log chance = -5000.3354
Iteration 3: log chance = -5000.3285
Iteration 4: log chance = -5000.3285
Probit regression Variety of obs = 9,234
LR chi2(3) = 2774.16
Prob > chi2 = 0.0000
Log chance = -5000.3285 Pseudo R2 = 0.2172
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.d | -.9754357 .0296157 -32.94 0.000 -1.033481 -.9173901
z1 | -.6995319 .0185883 -37.63 0.000 -.7359644 -.6630995
z2 | .998608 .052668 18.96 0.000 .8953806 1.101835
_cons | -.3024672 .0359367 -8.42 0.000 -.3729019 -.2320326
------------------------------------------------------------------------------
. margins, over(d)
Predictive margins Variety of obs = 9,234
Mannequin VCE : OIM
Expression : Pr(y), predict()
over : d
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
d |
0 | .6828357 .006175 110.58 0.000 .6707329 .6949384
1 | .3705836 .0063825 58.06 0.000 .3580741 .383093
------------------------------------------------------------------------------
As a result of we simulated the dataset, we all know that the true inhabitants parameter for (d_{i}=0) is 0.56 and 0.26 for (d_{i}=1). Though the conditional parameters of the probit mannequin are constant, our marginal imply estimates are far off. Nonetheless, if we’ve an excellent mannequin for the pattern choice course of, and if we observe the related variables for that mannequin, we might use it to create weights that will appropriate for the pattern choice. That’s, we want a mannequin for estimating every respondent’s likelihood of choosing him- or herself into the follow-up, after which we weight the decreased pattern mannequin with the inverse of that likelihood. If we are able to mannequin pattern choice primarily based on variables we observe, we converse of exogenous pattern choice (in contrast to endogenous pattern choice the place we’ve correlated unobservables throughout the choice and consequence mannequin). For particulars about IPW sample-selection fashions, see Wooldridge (2010), part 19.8. Additionally discover that the essential thought right here is identical as that for IPW treatment-effects estimators; see http://weblog.stata.com/2016/09/13/an-ordered-probit-inverse-probability-weighted-ipw-estimator/ and http://weblog.stata.com/2015/07/07/introduction-to-treatment-effects-in-stata-part-1/.
Whereas this sounds simple, correct estimation of our marginal means requires some care as a result of we’ve a multistage estimation drawback: we’ve a variety mannequin, an consequence mannequin, and we need to estimate marginal means. A method of fixing this drawback is to estimate the whole lot concurrently through the use of a generalized method-of-moments estimator (http://weblog.stata.com/2014/12/08/using-gmm-to-solve-two-step-estimation-problems/).
Mannequin and estimator
Modeling the pattern choice utilizing a probit mannequin with (s_i) being the choice indicator, we’ve
[s_{i} = 1[gamma_{0} + gamma_{1}z_{1,i} + gamma_{2}z_{2,i} +
gamma_{3}z_{3,i} + u_{i} > 0] = 1[{bf z}_i{boldsymbol gamma} + u_i > 0]]
The conditional likelihood of choice is
[
P(s_i=1 vert {boldsymbol z}_i) = Phi({bf z}_i{boldsymbol gamma})
]
We will use the inverse of this likelihood as a weight in estimating the mannequin parameters and population-averaged parameters utilizing the absolutely noticed pattern. Intuitively, utilizing the inverse-probability weight will appropriate the estimate to replicate each the absolutely and partially noticed observations.
For the expectations of curiosity, we’ve
start{eqnarray*} E(y_ivert d_i) &=&
Eleft{s_i{Phi({bf z}_i{boldsymbol gamma})}^{-1} E(y_i|d_i,{bf z}_i)
Huge{vert} d_iright} cr &=& Eleft{s_i{Phi({bf z}_i{boldsymbol
gamma})}^{-1} Phi({bf x}_i{boldsymbol beta})Huge{vert} d_iright}
finish{eqnarray*}
We’ll use the inverse-probability weight in second situations as we estimate the mannequin parameters and marginal means utilizing the generalized technique of moments. As a result of we use a probit mannequin for each the end result and choice mannequin, we are able to use the identical second situations for each, besides that we’ve totally different samples (the total and decreased pattern). Right here we use the first-order derivatives of the probit log-likelihood perform to retain most chance estimates. For the choice mannequin, we’ve pattern second situations
[
sum_{i=1}^{N} Bigg[ Bigg{ s_{i} frac{phi({bf z}_{i} boldsymbol{gamma})}{Phi({bf z_{i}}boldsymbol{gamma})} – (1-s_{i})
frac{phi({bf z}_{i}boldsymbol{gamma})}{Phi(-{bf z}_{i}boldsymbol{gamma})} Bigg} {bf z}_{i} Bigg] = 0
]
Let (S) be the indices for the absolutely noticed pattern. For the end result mannequin we’ve pattern second situations
[
sum_{iin S} Phi({bf z}_{i}boldsymbol{gamma})^{-1} Bigg[ Bigg{ y_{i} frac{phi({bf x}_{i}boldsymbol{beta})}{Phi({bf x}_{i}boldsymbol{beta})} – (1-y_{i})
frac{phi{(bf x}_{i}boldsymbol{beta})}{Phi(-{bf x}_{i}boldsymbol{beta})} Bigg} {bf x}_{i} Bigg] = 0
]
Lastly, the pattern second situations of our marginal parameters are
[
sum_{iin S} Phi({bf z}_{i}{boldsymbol gamma})^{-1}
Bigg[ Bigg{Phi({bf x}_{i}{boldsymbol beta}) – mu_{0} Bigg} (1-d_{i}) Bigg] = 0
]
[
sum_{iin S} Phi({bf z}_{i}{boldsymbol gamma})^{-1}
Bigg[ Bigg{Phi({bf x}_{i}{boldsymbol beta}) – mu_{1} Bigg} d_{i} Bigg] = 0
]
Estimation
Now, we estimate our parameters with gmm, utilizing the interactive model syntax. We first match the naked probit mannequin with out marginal means:
. gmm (eq1: (y*normalden({xb : i.d z1 z2 _cons})/regular({xb:})-
> (1-y)*normalden(-{xb:})/regular(-{xb:}))),
> devices(eq1: i.d z1 z2)
> winitial(unadjusted, unbiased) onestep
Step 1
Iteration 0: GMM criterion Q(b) = .1679307
Iteration 1: GMM criterion Q(b) = .00078746
Iteration 2: GMM criterion Q(b) = 4.901e-07
Iteration 3: GMM criterion Q(b) = 3.725e-13
Iteration 4: GMM criterion Q(b) = 2.405e-25
word: mannequin is strictly recognized
GMM estimation
Variety of parameters = 4
Variety of moments = 4
Preliminary weight matrix: Unadjusted Variety of obs = 9,234
------------------------------------------------------------------------------
| Sturdy
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.d | -.9754357 .0297148 -32.83 0.000 -1.033676 -.9171958
z1 | -.6995319 .0184779 -37.86 0.000 -.7357479 -.6633159
z2 | .998608 .052408 19.05 0.000 .8958901 1.101326
_cons | -.3024672 .0355903 -8.50 0.000 -.3722229 -.2327115
------------------------------------------------------------------------------
Devices for equation eq1: 0b.d 1.d z1 z2 _cons
The true inhabitants parameters in ({boldsymbol beta}) are (beta_{0}=-0.3), (beta_{1}=-1), (beta_{2}=-0.7), and (beta_{3}=1), and we are able to see that our estimates get very shut to those. These outcomes are equal to what we estimated earlier than utilizing probit. Now, we match the identical probit mannequin however add the marginal means:
. gmm (eq1: (y*normalden({xb : i.d z1 z2 _cons})/regular({xb:})-
> (1-y)*normalden(-{xb:})/regular(-{xb:})))
> (eq2: (1-d)*(regular({xb:})-{mu0}) )
> (eq3: d*(regular({xb:})-{mu1}) ),
> devices(eq1: i.d z1 z2)
> winitial(unadjusted, unbiased) onestep
Step 1
Iteration 0: GMM criterion Q(b) = .29293128
Iteration 1: GMM criterion Q(b) = .00400745
Iteration 2: GMM criterion Q(b) = 2.492e-06
Iteration 3: GMM criterion Q(b) = 1.014e-11
Iteration 4: GMM criterion Q(b) = 1.691e-22
word: mannequin is strictly recognized
GMM estimation
Variety of parameters = 6
Variety of moments = 6
Preliminary weight matrix: Unadjusted Variety of obs = 9,234
------------------------------------------------------------------------------
| Sturdy
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.d | -.9754357 .0297148 -32.83 0.000 -1.033676 -.9171958
z1 | -.6995319 .0184779 -37.86 0.000 -.7357479 -.6633159
z2 | .998608 .052408 19.05 0.000 .8958901 1.101326
_cons | -.3024672 .0355903 -8.50 0.000 -.3722229 -.2327115
-------------+----------------------------------------------------------------
/mu0 | .6828357 .00683 99.98 0.000 .6694491 .6962222
/mu1 | .3705836 .0071107 52.12 0.000 .3566469 .3845202
------------------------------------------------------------------------------
Devices for equation eq1: 0b.d 1.d z1 z2 _cons
Devices for equation eq2: _cons
Devices for equation eq3: _cons
The parameters labeled /mu0 and /mu1 are estimates of the marginal means for (d_{i}=0) and (d_{i}=1), respectively, not accounting for pattern choice. Once more, the marginal means are the identical that we obtained earlier from margins, and these estimates are inconsistent (the true values have been 0.56 for (d_{i}=0) and 0.26 for (d_{i}=1)).
Lastly, we match our sample-selection mannequin. We specify the nocommonesample possibility as a result of we use totally different units of observations throughout second situations:
. gmm (eq1: s*normalden({zb : z1 z2 z3 _cons})/regular({zb:})-
> (1-s)*normalden(-{zb:})/regular(-{zb:}))
> (eq2: (y*normalden({xb : i.d z1 z2 _cons})/regular({xb:})-
> (1-y)*normalden(-{xb:})/regular(-{xb:}))/(regular({zb:})))
> (eq3: (1-d)*(regular({xb:})-{mu0}) / regular({zb:}))
> (eq4: d*(regular({xb:})-{mu1}) / regular({zb:})),
> devices(eq1: z1 z2 z3)
> devices(eq2: i.d z1 z2)
> winitial(unadjusted, unbiased)
> onestep nocommonesample
Step 1
Iteration 0: GMM criterion Q(b) = .57729426
Iteration 1: GMM criterion Q(b) = .03006396
Iteration 2: GMM criterion Q(b) = .00014618
Iteration 3: GMM criterion Q(b) = 2.016e-08
Iteration 4: GMM criterion Q(b) = 1.477e-16
word: mannequin is strictly recognized
GMM estimation
Variety of parameters = 10
Variety of moments = 10
Preliminary weight matrix: Unadjusted Variety of obs = *
------------------------------------------------------------------------------
| Sturdy
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
zb |
z1 | -.706703 .0115024 -61.44 0.000 -.7292473 -.6841587
z2 | .9562022 .0347235 27.54 0.000 .8881454 1.024259
z3 | -.9735917 .0345574 -28.17 0.000 -1.041323 -.9058605
_cons | -.112247 .0257628 -4.36 0.000 -.1627412 -.0617528
-------------+----------------------------------------------------------------
xb |
1.d | -.9810332 .038836 -25.26 0.000 -1.05715 -.9049161
z1 | -.6651958 .0435223 -15.28 0.000 -.7504979 -.5798937
z2 | .9877247 .0986749 10.01 0.000 .7943255 1.181124
_cons | -.2775556 .0726005 -3.82 0.000 -.4198501 -.1352612
-------------+----------------------------------------------------------------
/mu0 | .5628197 .011652 48.30 0.000 .5399823 .5856572
/mu1 | .2694232 .007469 36.07 0.000 .2547843 .2840621
------------------------------------------------------------------------------
* Variety of observations for equation eq1: 20000
Variety of observations for equation eq2: 9234
Variety of observations for equation eq3: 9234
Variety of observations for equation eq4: 9234
------------------------------------------------------------------------------
Devices for equation eq1: z1 z2 z3 _cons
Devices for equation eq2: 0b.d 1.d z1 z2 _cons
Devices for equation eq3: _cons
Devices for equation eq4: _cons
We will see that our estimates of the marginal means are actually near the true values.
Conclusion
We demonstrated methods to use gmm to estimate population-averaged parameters with an IPW estimator. This solves a lacking knowledge drawback arising from an exogenous sample-selection course of.
Appendix
Right here is the code that we used for producing the dataset:
drop _all set seed 123 qui set obs 20000 generate double d = runiform() > .5 generate double z1 = rnormal() generate double z2 = runiform() generate double z3 = runiform() generate double u = rnormal() generate double e = rnormal() generate double zb = (-0.1 - 0.7*z1 + z2 - z3) generate double xb = (-0.3 - d - 0.7*z1 + z2) generate double s = (zb + u) > 0 generate double y = (xb + e) > 0 qui substitute y = . if s == 0 qui substitute d = . if s == 0 generate id = _n
Reference
Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Information. 2nd ed. Cambridge, MA: MIT Press.
