Overview
In a earlier submit, David Drukker demonstrated tips on how to use mlexp to estimate the diploma of freedom parameter in a chi-squared distribution by most chance (ML). On this submit, I’m going to make use of mlexp to estimate the parameters of a probit mannequin with pattern choice. I’ll illustrate tips on how to specify a extra complicated chance in mlexp and supply instinct for the probit mannequin with pattern choice. Our outcomes match the heckprobit command; see [R] heckprobit for extra particulars.
Probit mannequin
For binary consequence (y_i) and regressors ({bf x}_i), the probit mannequin assumes
[begin{equation} label{eq:outcome} y_i = {bf 1}({bf x}_i{boldsymbol beta} + epsilon_i > 0) tag{1} end{equation}]
the place the error (epsilon_i) is normal regular. The indicator operate ({bf1}(cdot)) outputs 1 when its enter is true and outputs 0 in any other case.
The log chance of the probit mannequin is
[begin{equation}
ln L = sum_{i=1}^{N} y_i ln Phi({bf x}_i{boldsymbol beta}) + (1-y_i)ln{1-Phi({bf x}_i{boldsymbol beta})} nonumber
end{equation}]
the place (Phi) is the usual regular cumulative distribution operate.
The probit mannequin is extensively used to mannequin binary outcomes. However there are conditions the place it isn’t applicable. Generally we observe a random pattern the place the end result is lacking on sure observations. If there’s a relationship between the unobserved error of the end result (epsilon_i) and the unobserved error that impacts whether or not the end result is noticed (epsilon_{si}), then estimates made utilizing the probit mannequin can be inconsistent for ({boldsymbol beta}). As an illustration, this might occur after we mannequin job satisfaction and our pattern consists of employed and unemployed people. The unobserved components that have an effect on your job satisfaction could also be correlated with components that have an effect on your employment standing. Samples like this are stated to endure from “choice on unobservables”.
Probit mannequin with pattern choice
Van de Ven and Van Pragg (1981) launched the probit mannequin with pattern choice to permit for constant estimation of ({boldsymbol beta}) in samples that endure from choice on unobservables. The equation for the end result (1) stays the identical, however we add one other equation. The choice course of for the end result is modeled as
[begin{equation}
s_i = {bf 1}({bf z}_i{boldsymbol gamma} + epsilon_{si} > 0) nonumber
end{equation}]
the place (s_i=1) if we noticed (y_i) and (s_i=0) in any other case, and ({bf z}_i) are regressors that have an effect on the choice course of.
The errors (epsilon_i) and (epsilon_{si}) are assumed to be normal regular with
[begin{equation}
mbox{corr}(epsilon_i,epsilon_{si}) = rho nonumber
end{equation}]
Let (S) be the set of observations the place (y_i) is noticed. The chance for the probit mannequin with pattern choice is
[begin{eqnarray*}
ln L &=& sum_{iin S}^{} y_ilnPhi_2({bf x}_i{boldsymbol beta}, {bf z}_i{boldsymbol gamma},rho) +
(1-y_i)lnPhi_2(-{bf x}_i{boldsymbol beta}, {bf z}_i{boldsymbol gamma},-rho) + cr
& & sum_{inotin S}^{} ln {1- Phi({bf z}_i{boldsymbol gamma})}
end{eqnarray*}]
the place (Phi_2) is the bivariate regular cumulative distribution operate.
The information
We are going to simulate knowledge from a probit mannequin with pattern choice after which estimate the parameters of the mannequin utilizing mlexp. We simulate a random pattern of seven,000 observations.
. drop _all . set seed 441 . set obs 7000 variety of observations (_N) was 0, now 7,000 . generate x = .5*rchi2(2) . generate z = rnormal() . generate b = rbinomial(2,.5)
First, we generate the regressors. We use a (chi^2) variable with (2) levels of freedom (x) scaled by (0.5) as a regressor for the end result. A normal regular variable (z) is used as a variety regressor. The variable (b) has a binomial((2,0.5)) distribution and can be used as a variety regressor.
. matrix cm = (1,.7 .7,1) . drawnorm ey es, corr(cm)
Subsequent, we draw the unobserved errors. The end result (y) and choice indicator (s) can be generated with errors which have correlation (0.7). We generate the errors with the drawnorm command.
. generate s = z + 1.3*0.b + 1.b + .5*2.b + es > 0 . generate y = .7*x + ey + .5 > 0 . substitute y = . if !s (1,750 actual modifications made, 1,750 to lacking)
Lastly, we generate the end result and choice indicator. We specify the impact of (b) on choice by utilizing factor-variable notation. Each worth of (b) offers a special intercept for (s). We set the end result to lacking for observations the place (s) is (0).
Impact of ignoring pattern choice
First, we are going to use mlexp to estimate the probit mannequin, ignoring the pattern choice. We use the cond() operate to calculate totally different values of the chance based mostly on the worth of (y). For cond(a,b,c), b is returned if a is true and c is returned in any other case. We use solely the observations for which (y) just isn’t lacking by specifying (y) within the variables() choice. The variables within the equation y are specified as soon as, the primary time the equation parameters are used within the chance. When the equation is used once more, it’s known as ({{bf y}:}).
. mlexp (ln(cond(y,regular({y: x _cons}),1-normal({y:})))), variables(y)
preliminary: log chance = -3639.0227
different: log chance = -2342.8722
rescale: log chance = -1746.0961
Iteration 0: log chance = -1746.0961
Iteration 1: log chance = -1503.9519
Iteration 2: log chance = -1485.2935
Iteration 3: log chance = -1485.1677
Iteration 4: log chance = -1485.1677
Most chance estimation
Log chance = -1485.1677 Variety of obs = 5,250
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .813723 .0568938 14.30 0.000 .7022132 .9252328
_cons | .7623006 .0386929 19.70 0.000 .6864639 .8381372
------------------------------------------------------------------------------
Each parameters are overestimated, and the true values usually are not within the estimated confidence intervals.
Accounting for pattern choice
Now, we use mlexp to estimate the probit mannequin with pattern choice. We use the cond() operate twice, as soon as for the choice indicator worth and as soon as for the end result worth. We not have to specify the variables() choice as a result of we are going to use every commentary within the knowledge. We use the factor-variable operator ibn within the choice equation so {that a} separate intercept is used within the equation for every stage of (b).
. mlexp (ln(cond(s,cond(y,binormal({y: x _cons},{s: z ibn.b}, {rho}), binormal(
> -{y:},{s:}, -{rho})),1-normal({s:}))))
preliminary: log chance = -8491.053
different: log chance = -5898.851
rescale: log chance = -5898.851
rescale eq: log chance = -5654.3504
Iteration 0: log chance = -5654.3504
Iteration 1: log chance = -5473.5319 (not concave)
Iteration 2: log chance = -4401.6027 (not concave)
Iteration 3: log chance = -4340.7398 (not concave)
Iteration 4: log chance = -4333.6402 (not concave)
Iteration 5: log chance = -4326.1744 (not concave)
Iteration 6: log chance = -4316.4936 (not concave)
Iteration 7: log chance = -4261.307
Iteration 8: log chance = -4154.7548
Iteration 9: log chance = -4142.7991
Iteration 10: log chance = -4141.7431
Iteration 11: log chance = -4141.7306
Iteration 12: log chance = -4141.7305
Most chance estimation
Log chance = -4141.7305 Variety of obs = 7,000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y |
x | .7643362 .0532342 14.36 0.000 .659999 .8686734
_cons | .5259657 .0406914 12.93 0.000 .446212 .6057195
-------------+----------------------------------------------------------------
s |
z | 1.028631 .0260977 39.41 0.000 .977481 1.079782
|
b |
0 | 1.365497 .0440301 31.01 0.000 1.2792 1.451794
1 | 1.034018 .0297178 34.79 0.000 .9757726 1.092264
2 | .530342 .0353022 15.02 0.000 .461151 .5995331
-------------+----------------------------------------------------------------
/rho | .6854869 .0417266 16.43 0.000 .6037043 .7672696
------------------------------------------------------------------------------
Our estimates of the coefficient on (x) and the fixed intercept are nearer to the true values. The boldness intervals additionally embody the true values. The correlation (rho) is estimated to be (0.69), and the true worth of (0.7) is within the confidence interval. This mannequin clearly works higher.
Conclusion
I’ve demonstrated tips on how to estimate the parameters of a mannequin with a reasonably complicated chance operate: the probit mannequin with pattern choice utilizing mlexp. I additionally illustrated tips on how to generate knowledge from this mannequin and the way its outcomes differ from the straightforward probit mannequin.
See [R] mlexp for extra particulars about mlexp. In a future submit, we are going to present tips on how to make predictions after mlexp and tips on how to estimate inhabitants common parameters utilizing mlexp and margins.
Reference
Van de Ven, W. P. M. M., and B. M. S. Van Pragg. 1981. The demand for deductibles in personal medical health insurance: A probit mannequin with pattern choice. Journal of Econometrics 17: 229{252.
