Monday, January 26, 2026

A number of equation fashions: Estimation and marginal results utilizing mlexp


We proceed with the sequence of posts the place we illustrate tips on how to receive appropriate commonplace errors and marginal results for fashions with a number of steps. On this submit, we estimate the marginal results and commonplace errors for a hurdle mannequin with two hurdles and a lognormal final result utilizing mlexp. mlexp permits us to estimate parameters for multiequation fashions utilizing most chance. Within the final submit (A number of equation fashions: Estimation and marginal results utilizing gsem), we used gsem to estimate marginal results and commonplace errors for a hurdle mannequin with two hurdles and an exponential imply final result.

We exploit the truth that the hurdle-model chances are separable and the joint log chances are the sum of the person hurdle and final result log likelihoods. We estimate the parameters of every hurdle and the result individually to get preliminary values. Then, we use mlexp to estimate the parameters of the mannequin and margins to acquire marginal results.

Beginning factors: mannequin and preliminary values

We mannequin the quantity spent on dental care. The primary hurdle is whether or not a person spends or doesn’t spend on dental care. The second hurdle is the person stage of insurance coverage protection. Totally different ranges of protection result in totally different spending on dental care.

We assume probit and ordered probit fashions for the 2 hurdles. In distinction to the earlier submit, we use a lognormal distribution to mannequin the quantity spent. With these distributional assumptions, we use most chance relatively than quasi-likelihood, as within the earlier submit, to estimate the parameters of the mannequin.

We receive preliminary values by mlexp for every hurdle and the lognormal final result and retailer the log-likelihood expression for every step in an area macro. These native macros are summed collectively within the last use of mlexp to get parameter estimates and commonplace errors.

spend is a binary final result for whether or not a person spends cash on dental care. That is the primary hurdle variable. We retailer the log-likelihood expression within the native macro probit. See Appendix for extra data. Then, we use mlexp to estimate the parameters of the hurdle. The purpose estimates are saved within the matrix binit.


. native probit ln(cond(1.spend,regular({spend: x1 x2 x4 _cons}),   
>                 1-normal({spend:})))

. mlexp (`probit')

preliminary:       log chance = -6931.4718
different:   log chance = -5926.3721
rescale:       log chance = -5926.3721
Iteration 0:   log chance = -5926.3721
Iteration 1:   log chance = -4789.4607
Iteration 2:   log chance = -4776.9361
Iteration 3:   log chance = -4776.9332
Iteration 4:   log chance = -4776.9332

Most chance estimation

Log chance = -4776.9332                   Variety of obs     =     10,000

----------------------------------------------------------------------------
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
spend      |
        x1 |   .5138169   .0160374    32.04   0.000     .4823841    .5452498
        x2 |  -.5276993   .0224236   -23.53   0.000    -.5716488   -.4837498
        x4 |   .4822064   .0185374    26.01   0.000     .4458736    .5185391
     _cons |   .5568866     .02899    19.21   0.000     .5000672     .613706
----------------------------------------------------------------------------

. matrix binit = e(b)

insurance coverage is a three-level ordered final result indicating insurance coverage stage. That is the variable for the second hurdle. We retailer the log-likelihood expression within the native macro oprobit and use mlexp as earlier than.


. native oprobit ln(cond(insurance coverage==1,regular(-{insurance coverage: x3 x4}+{cut1}),  
>                 cond(insurance coverage==2,regular({cut2}-{insurance coverage:})-          
>                         regular({cut1}-{insurance coverage:}),                    
>                         1-normal({cut2}-{insurance coverage:}))))

. mlexp (`oprobit')

preliminary:       log chance =     -  (couldn't be evaluated)
possible:      log chance = -23924.936
rescale:       log chance = -19788.939
rescale eq:    log chance = -11884.962
Iteration 0:   log chance = -11884.962
Iteration 1:   log chance = -10261.611
Iteration 2:   log chance = -10227.115
Iteration 3:   log chance = -10226.993
Iteration 4:   log chance = -10226.993

Most chance estimation

Log chance = -10226.993                   Variety of obs     =     10,000

----------------------------------------------------------------------------
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
insurance coverage  |
        x3 |   .2926129   .0100288    29.18   0.000     .2729569     .312269
        x4 |  -.2754986   .0144941   -19.01   0.000    -.3039066   -.2470906
-----------+----------------------------------------------------------------
     /cut1 |  -1.270488   .0255784   -49.67   0.000    -1.320621   -1.220355
     /cut2 |  -.2612825   .0235687   -11.09   0.000    -.3074763   -.2150887
----------------------------------------------------------------------------

. matrix binit = binit, e(b)

Now, we use mlexp to estimate the parameters of the lognormal final result. spent corresponds to the quantity spent on dental care. We use factor-variable notation to make use of a special intercept for every stage of insurance coverage, (beta_{ins,1}ldots,beta_{ins,3}). The covariates are laid out in equation spent, and the fixed intercepts are laid out in spent_int. The log-likelihood expression is saved within the native macro lognormal. We limit estimation to the optimistic pattern and use mlexp to estimate the result parameters.


. native lognormal -.5*((ln(spent)-{spent: x4 x5 x6} -     
>         {spent_int: ibn.insurance coverage})/                    
>         (exp({lnsigma})))^2                             
>         -ln((spent*exp({lnsigma})*sqrt(2*_pi)))

. mlexp (`lognormal') if spend

preliminary:       log chance = -16596.787
different:   log chance = -16544.473
rescale:       log chance = -15515.652
rescale eq:    log chance = -14206.308
Iteration 0:   log chance = -14206.308
Iteration 1:   log chance =  -13818.45
Iteration 2:   log chance = -13520.664
Iteration 3:   log chance = -13519.085
Iteration 4:   log chance = -13519.084

Most chance estimation

Log chance = -13519.084                   Variety of obs     =      7,228

----------------------------------------------------------------------------
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
spent      |
        x4 |   .2953067   .0121253    24.35   0.000     .2715415    .3190718
        x5 |  -.2917276   .0088451   -32.98   0.000    -.3090637   -.2743915
        x6 |   .2891539   .0227041    12.74   0.000     .2446548    .3336531
-----------+----------------------------------------------------------------
spent_int  |
 insurance coverage |
        1  |   .1924971   .0239883     8.02   0.000     .1454809    .2395134
        2  |   .2950143   .0214191    13.77   0.000     .2530336    .3369951
        3  |    .454055   .0202381    22.44   0.000      .414389    .4937209
-----------+----------------------------------------------------------------
  /lnsigma |  -.2953913   .0083172   -35.52   0.000    -.3116926   -.2790899
----------------------------------------------------------------------------

. matrix binit = binit, e(b)

Joint Estimation and marginal results

Now, we use mlexp to estimate the parameters of the joint mannequin. The joint log chances are specified because the sum of the person log likelihoods. We merely add up the native macros that we created within the final part. The matrix binit incorporates the purpose estimates from the person steps. We specify this matrix in from() to present mlexp good beginning values. We specify vce(strong) in order that we are able to use margins to estimate the marginal results over the inhabitants of covariates utilizing margins and vce(unconditional).


 mlexp (`probit' + `oprobit' + cond(spend,(`lognormal'),0)), 
>         vce(strong) from(binit)

Iteration 0:   log pseudolikelihood =  -28523.01
Iteration 1:   log pseudolikelihood =  -28523.01

Most chance estimation

Log pseudolikelihood =  -28523.01             Variety of obs     =     10,000

----------------------------------------------------------------------------
           |               Strong
           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
spend      |
        x1 |    .513817   .0159257    32.26   0.000     .4826032    .5450307
        x2 |  -.5276994   .0224874   -23.47   0.000    -.5717739   -.4836249
        x4 |   .4822064   .0185804    25.95   0.000     .4457894    .5186234
     _cons |   .5568866   .0289611    19.23   0.000     .5001239    .6136494
-----------+----------------------------------------------------------------
insurance coverage  |
        x3 |   .2926129   .0100343    29.16   0.000     .2729461    .3122798
        x4 |  -.2754986   .0144074   -19.12   0.000    -.3037366   -.2472605
-----------+----------------------------------------------------------------
cut1       |
     _cons |  -1.270488   .0254071   -50.01   0.000    -1.320285   -1.220691
-----------+----------------------------------------------------------------
cut2       |
     _cons |  -.2612825   .0236404   -11.05   0.000    -.3076167   -.2149482
-----------+----------------------------------------------------------------
spent      |
        x4 |   .2953067       .012    24.61   0.000      .271787    .3188263
        x5 |  -.2917276   .0088412   -33.00   0.000     -.309056   -.2743991
        x6 |   .2891539   .0224396    12.89   0.000     .2451731    .3331347
-----------+----------------------------------------------------------------
spent_int  |
 insurance coverage |
        1  |   .1924972   .0242748     7.93   0.000     .1449194     .240075
        2  |   .2950143    .021299    13.85   0.000      .253269    .3367597
        3  |   .4540549   .0202593    22.41   0.000     .4143473    .4937625
-----------+----------------------------------------------------------------
  /lnsigma |  -.2953912   .0084227   -35.07   0.000    -.3118993    -.278883
----------------------------------------------------------------------------

The purpose estimates from the person steps match the joint estimates. Now, we receive marginal results of the result covariates on the conditional imply of the result.

The conditional imply of expenditure on the impartial variables is given by
start{eqnarray*}
Eleft(textual content{spent}|Xright) = Phi(X_p{boldsymbol beta}_p)
left[begin{matrix}Phi(kappa_1-X_o{boldsymbol beta}_o)exp{beta}_{{ins},1} + cr
left{Phi(kappa_2-X_o{boldsymbol beta}_o) – Phi(kappa_1-X_o{boldsymbol beta}_o)right} exp{ beta}_{{ins},2} + cr left{1-Phi(kappa_2-X_o{boldsymbol beta}_o)right} exp{ beta}_{{ins},3}end{matrix}right] expleft(X_e{boldsymbol beta}_e + .5sigma^2right)cr
finish{eqnarray*}
the place (kappa_{1}) and (kappa_{2}) are the cutoff factors from the insurance coverage mannequin, which represents insurance coverage stage. We use the subscripts p, o, and e to emphasise that the covariates and coefficients associated to the probit, ordered probit, and exponential imply are totally different.

Now, we use margins to estimate the marginal results. We use the expression() choice to put in writing an expression for the anticipated worth of quantity spent.


. margins, expression(regular(xb(spend))*(                 
>         exp(_b[spent_int:1.insurance])*                 
>                 regular(_b[/cut1]-xb(insurance coverage))+        
>         exp(_b[spent_int:2.insurance])*                 
>                 (regular(_b[/cut2]-xb(insurance coverage)) -      
>                 regular(_b[/cut1]-xb(insurance coverage)))+      
>         exp(_b[spent_int:3.insurance])*(                
>                 1-normal(_b[/cut2]-xb(insurance coverage))))*    
>         exp(xb(spent)+.5*exp(_b[/lnsigma])^2))         
>         dydx(x4 x5 x6) vce(unconditional)

Common marginal results                     Variety of obs     =     10,000

Expression   : regular(xb(spend))*( exp(_b[spent_int:1.insurance])*
               regular(_b[/cut1]-xb(insurance coverage))+
               exp(_b[spent_int:2.insurance])*
               (regular(_b[/cut2]-xb(insurance coverage)) -
               regular(_b[/cut1]-xb(insurance coverage)))+
               exp(_b[spent_int:3.insurance])*(
               1-normal(_b[/cut2]-xb(insurance coverage))))*
               exp(xb(spent)+.5*exp(_b[/lnsigma])^2)
dy/dx w.r.t. : x4 x5 x6

----------------------------------------------------------------------------
           |            Unconditional
           |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-----------+----------------------------------------------------------------
        x4 |   .9525996   .0311351    30.60   0.000     .8915759    1.013623
        x5 |  -.6329119   .0224351   -28.21   0.000    -.6768838     -.58894
        x6 |   .6273282   .0499249    12.57   0.000     .5294772    .7251792
----------------------------------------------------------------------------

Remaining issues

We illustrated tips on how to use mlexp to acquire the estimates and commonplace errors for a a number of hurdle mannequin and its marginal results. In subsequent posts, we receive these outcomes for different multistep fashions utilizing different Stata instruments.

Appendix 1

Under is the code used to provide the info.


clear
set seed 11134
set obs 10000
// Producing exogenous variables
generate x1 = rnormal()
generate x2 = int(3*rbeta(2,3))
generate x3 = rchi2(1)-2
generate x4 = ln(rchi2(4))
generate x5 = rnormal()
generate x6 = rbeta(2,3)>.6
// Producing unobservables
generate ep = rnormal() // for probit
generate eo = rnormal() // for ordered probit
generate e  = rnormal()*.75 // for lognormal equation
// Producing linear predictions
generate xbp = .5*(1 + x1 - x2 + x4)
generate xbo = .3*(1 + x3 - x4)
generate yotemp      = xbo + eo
generate insurance coverage   = yotemp
substitute insurance coverage = 1 if yotemp < -1
substitute insurance coverage = 2 if yotemp> -1 & yotemp<0
substitute insurance coverage = 3 if yotemp> 0
generate xbe = .3*(- x5 + x6 + x4 + .5*insurance coverage)
// Producing outcomes
generate spend       = xbp + ep > 0
generate yexp = exp(xbe + e)
generate spent = spend*yexp

Appendix 2

Macros are a programming characteristic supplied by Stata. They can be utilized to make sophisticated expressions straightforward to put in writing. All over the place a punctuated macro title seems in a command, the macro contents are substituted for the macro title. So an advanced expression will be saved in a macro with a brief title, and the expression can be utilized repeatedly by solely typing the punctuated brief title. See Programming an estimation command in Stata: World macros versus native macros for a dialogue of macros.

On this weblog, we use native macros to retailer expressions utilized in calculating the log chance of our mannequin. For the probit mannequin, we saved the log-likelihood expression within the native macro probit.


. native probit 
> ln(cond(1.spend,regular({spend: x1 x2 x4 _cons}),1-normal({spend:})))

Once we kind the punctuated `probit’ later, Stata makes use of the expression saved inside probit. We are able to see the expression by displaying it.


. show "`probit'"
ln(cond(1.spend,regular({spend: x1 x2 x4 _cons}),1-normal({spend:})))



Related Articles

Latest Articles