Monday, March 9, 2026

Probit mannequin with pattern choice by mlexp


Overview

In a earlier submit, David Drukker demonstrated tips on how to use mlexp to estimate the diploma of freedom parameter in a chi-squared distribution by most chance (ML). On this submit, I’m going to make use of mlexp to estimate the parameters of a probit mannequin with pattern choice. I’ll illustrate tips on how to specify a extra complicated chance in mlexp and supply instinct for the probit mannequin with pattern choice. Our outcomes match the heckprobit command; see [R] heckprobit for extra particulars.

Probit mannequin

For binary consequence (y_i) and regressors ({bf x}_i), the probit mannequin assumes

[begin{equation} label{eq:outcome} y_i = {bf 1}({bf x}_i{boldsymbol beta} + epsilon_i > 0) tag{1} end{equation}]

the place the error (epsilon_i) is normal regular. The indicator operate ({bf1}(cdot)) outputs 1 when its enter is true and outputs 0 in any other case.

The log chance of the probit mannequin is

[begin{equation}
ln L = sum_{i=1}^{N} y_i ln Phi({bf x}_i{boldsymbol beta}) + (1-y_i)ln{1-Phi({bf x}_i{boldsymbol beta})} nonumber
end{equation}]

the place (Phi) is the usual regular cumulative distribution operate.

The probit mannequin is extensively used to mannequin binary outcomes. However there are conditions the place it isn’t applicable. Generally we observe a random pattern the place the end result is lacking on sure observations. If there’s a relationship between the unobserved error of the end result (epsilon_i) and the unobserved error that impacts whether or not the end result is noticed (epsilon_{si}), then estimates made utilizing the probit mannequin can be inconsistent for ({boldsymbol beta}). As an illustration, this might occur after we mannequin job satisfaction and our pattern consists of employed and unemployed people. The unobserved components that have an effect on your job satisfaction could also be correlated with components that have an effect on your employment standing. Samples like this are stated to endure from “choice on unobservables”.

Probit mannequin with pattern choice

Van de Ven and Van Pragg (1981) launched the probit mannequin with pattern choice to permit for constant estimation of ({boldsymbol beta}) in samples that endure from choice on unobservables. The equation for the end result (1) stays the identical, however we add one other equation. The choice course of for the end result is modeled as

[begin{equation}
s_i = {bf 1}({bf z}_i{boldsymbol gamma} + epsilon_{si} > 0) nonumber
end{equation}]

the place (s_i=1) if we noticed (y_i) and (s_i=0) in any other case, and ({bf z}_i) are regressors that have an effect on the choice course of.

The errors (epsilon_i) and (epsilon_{si}) are assumed to be normal regular with

[begin{equation}
mbox{corr}(epsilon_i,epsilon_{si}) = rho nonumber
end{equation}]

Let (S) be the set of observations the place (y_i) is noticed. The chance for the probit mannequin with pattern choice is

[begin{eqnarray*}
ln L &=& sum_{iin S}^{} y_ilnPhi_2({bf x}_i{boldsymbol beta}, {bf z}_i{boldsymbol gamma},rho) +
(1-y_i)lnPhi_2(-{bf x}_i{boldsymbol beta}, {bf z}_i{boldsymbol gamma},-rho) + cr
& & sum_{inotin S}^{} ln {1- Phi({bf z}_i{boldsymbol gamma})}
end{eqnarray*}]

the place (Phi_2) is the bivariate regular cumulative distribution operate.

The information

We are going to simulate knowledge from a probit mannequin with pattern choice after which estimate the parameters of the mannequin utilizing mlexp. We simulate a random pattern of seven,000 observations.


. drop _all

. set seed 441

. set obs 7000
variety of observations (_N) was 0, now 7,000

. generate x = .5*rchi2(2)

. generate z = rnormal()

. generate b = rbinomial(2,.5)

First, we generate the regressors. We use a (chi^2) variable with (2) levels of freedom (x) scaled by (0.5) as a regressor for the end result. A normal regular variable (z) is used as a variety regressor. The variable (b) has a binomial((2,0.5)) distribution and can be used as a variety regressor.


. matrix cm = (1,.7  .7,1)

. drawnorm ey es, corr(cm)

Subsequent, we draw the unobserved errors. The end result (y) and choice indicator (s) can be generated with errors which have correlation (0.7). We generate the errors with the drawnorm command.


. generate s = z + 1.3*0.b + 1.b + .5*2.b + es > 0

. generate y = .7*x + ey  + .5 > 0

. substitute y = .  if !s
(1,750 actual modifications made, 1,750 to lacking)

Lastly, we generate the end result and choice indicator. We specify the impact of (b) on choice by utilizing factor-variable notation. Each worth of (b) offers a special intercept for (s). We set the end result to lacking for observations the place (s) is (0).

Impact of ignoring pattern choice

First, we are going to use mlexp to estimate the probit mannequin, ignoring the pattern choice. We use the cond() operate to calculate totally different values of the chance based mostly on the worth of (y). For cond(a,b,c), b is returned if a is true and c is returned in any other case. We use solely the observations for which (y) just isn’t lacking by specifying (y) within the variables() choice. The variables within the equation y are specified as soon as, the primary time the equation parameters are used within the chance. When the equation is used once more, it’s known as ({{bf y}:}).


. mlexp (ln(cond(y,regular({y: x _cons}),1-normal({y:})))), variables(y)

preliminary:       log chance = -3639.0227
different:   log chance = -2342.8722
rescale:       log chance = -1746.0961
Iteration 0:   log chance = -1746.0961  
Iteration 1:   log chance = -1503.9519  
Iteration 2:   log chance = -1485.2935  
Iteration 3:   log chance = -1485.1677  
Iteration 4:   log chance = -1485.1677  

Most chance estimation

Log chance = -1485.1677                     Variety of obs     =      5,250

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |    .813723   .0568938    14.30   0.000     .7022132    .9252328
       _cons |   .7623006   .0386929    19.70   0.000     .6864639    .8381372
------------------------------------------------------------------------------

Each parameters are overestimated, and the true values usually are not within the estimated confidence intervals.

Accounting for pattern choice

Now, we use mlexp to estimate the probit mannequin with pattern choice. We use the cond() operate twice, as soon as for the choice indicator worth and as soon as for the end result worth. We not have to specify the variables() choice as a result of we are going to use every commentary within the knowledge. We use the factor-variable operator ibn within the choice equation so {that a} separate intercept is used within the equation for every stage of (b).


. mlexp (ln(cond(s,cond(y,binormal({y: x _cons},{s: z ibn.b}, {rho}), binormal(
> -{y:},{s:}, -{rho})),1-normal({s:}))))

preliminary:       log chance =  -8491.053
different:   log chance =  -5898.851
rescale:       log chance =  -5898.851
rescale eq:    log chance = -5654.3504
Iteration 0:   log chance = -5654.3504  
Iteration 1:   log chance = -5473.5319  (not concave)
Iteration 2:   log chance = -4401.6027  (not concave)
Iteration 3:   log chance = -4340.7398  (not concave)
Iteration 4:   log chance = -4333.6402  (not concave)
Iteration 5:   log chance = -4326.1744  (not concave)
Iteration 6:   log chance = -4316.4936  (not concave)
Iteration 7:   log chance =  -4261.307  
Iteration 8:   log chance = -4154.7548  
Iteration 9:   log chance = -4142.7991  
Iteration 10:  log chance = -4141.7431  
Iteration 11:  log chance = -4141.7306  
Iteration 12:  log chance = -4141.7305  

Most chance estimation

Log chance = -4141.7305                     Variety of obs     =      7,000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
y            |
           x |   .7643362   .0532342    14.36   0.000      .659999    .8686734
       _cons |   .5259657   .0406914    12.93   0.000      .446212    .6057195
-------------+----------------------------------------------------------------
s            |
           z |   1.028631   .0260977    39.41   0.000      .977481    1.079782
             |
           b |
          0  |   1.365497   .0440301    31.01   0.000       1.2792    1.451794
          1  |   1.034018   .0297178    34.79   0.000     .9757726    1.092264
          2  |    .530342   .0353022    15.02   0.000      .461151    .5995331
-------------+----------------------------------------------------------------
        /rho |   .6854869   .0417266    16.43   0.000     .6037043    .7672696
------------------------------------------------------------------------------

Our estimates of the coefficient on (x) and the fixed intercept are nearer to the true values. The boldness intervals additionally embody the true values. The correlation (rho) is estimated to be (0.69), and the true worth of (0.7) is within the confidence interval. This mannequin clearly works higher.

Conclusion

I’ve demonstrated tips on how to estimate the parameters of a mannequin with a reasonably complicated chance operate: the probit mannequin with pattern choice utilizing mlexp. I additionally illustrated tips on how to generate knowledge from this mannequin and the way its outcomes differ from the straightforward probit mannequin.

See [R] mlexp for extra particulars about mlexp. In a future submit, we are going to present tips on how to make predictions after mlexp and tips on how to estimate inhabitants common parameters utilizing mlexp and margins.

Reference

Van de Ven, W. P. M. M., and B. M. S. Van Pragg. 1981. The demand for deductibles in personal medical health insurance: A probit mannequin with pattern choice. Journal of Econometrics 17: 229{252.



Related Articles

Latest Articles