Tuesday, January 27, 2026

A number of equation fashions: Estimation and marginal results utilizing gsem


Start line: A hurdle mannequin with a number of hurdles

In a sequence of posts, we’re going to illustrate tips on how to get hold of right normal errors and marginal results for fashions with a number of steps.

Our inspiration for this publish is an outdated Statalist inquiry about tips on how to get hold of marginal results for a hurdle mannequin with multiple hurdle (http://www.statalist.org/boards/discussion board/general-stata-discussion/basic/1337504-estimating-marginal-effect-for-triple-hurdle-model). Hurdle fashions have the interesting property that their chances are separable. Every hurdle has its personal chance and regressors. You’ll be able to estimate every one in every of these hurdles individually to acquire level estimates. Nonetheless, you can not get normal errors or marginal results this fashion.

On this publish, we present tips on how to get the marginal results and normal errors for a hurdle mannequin with two hurdles utilizing gsem. gsem is right for this objective as a result of it permits us to estimate likelihood-based fashions with a number of equations.

The mannequin

Suppose we have an interest within the imply spending on dental care, given the attribute of the people. Some individuals spend zero {dollars} on dental care in a yr, and a few individuals spend greater than zero {dollars}. Solely the people that cross a hurdle are keen to spend a constructive quantity on dental care. Hurdle fashions permit the traits of the people that spend a constructive quantity and people who spend zero to vary.

There could possibly be multiple hurdle. Within the dental-care spending instance, the second hurdle could possibly be insurance coverage protection: uninsured, fundamental insurance coverage, or premium insurance coverage. We mannequin the primary hurdle of spending zero or a constructive quantity by a probit. We mannequin the second hurdle of insurance coverage degree utilizing an ordered probit. Lastly, we mannequin the constructive quantity spent utilizing an exponential-mean mannequin.

We have an interest within the marginal results for the imply quantity spent for somebody with premium insurance coverage, given particular person traits. The expression for this conditional imply is
start{eqnarray*}
Eleft(textual content{expenditure}|X, {tt insurance coverage} =textual content{premium}proper)
&=& Phi(X_pbeta_p)Phileft(X_obeta_o – textual content{premium}proper)
expleft(X_ebeta_eright)
finish{eqnarray*}
The conditional imply accounts for the chances of being in numerous threshold ranges and for the expenditure preferences amongst these spending a constructive quantity.We use the subscripts (p), (o), and (e) to emphasise that the covariates and coefficients associated to the probit, ordered probit, and exponential imply are completely different.

Beneath we are going to use gsem to estimate the mannequin parameters from simulated information. spend is a binary final result for whether or not a person spends cash on dental care, insurance coverage is an ordered final result indicating insurance coverage degree, and expenditure corresponds to the quantity spent on dental care.


. gsem  (spend       <- x1 x2 x4, probit)    
>       (insurance coverage   <- x3 x4, oprobit)      
>       (expenditure <- x5 x6 x4, poisson),  
>        vce(sturdy)
notice: expenditure has noncount values;
      you might be liable for the household(poisson) interpretation

Iteration 0:   log pseudolikelihood = -171938.67
Iteration 1:   log pseudolikelihood = -79591.213
Iteration 2:   log pseudolikelihood = -78928.015
Iteration 3:   log pseudolikelihood = -78925.126
Iteration 4:   log pseudolikelihood = -78925.126

Generalized structural equation mannequin        Variety of obs     =     10,000

Response       : spend
Household         : Bernoulli
Hyperlink           : probit

Response       : insurance coverage
Household         : ordinal
Hyperlink           : probit

Response       : expenditure
Household         : Poisson
Hyperlink           : log

Log pseudolikelihood = -78925.126

----------------------------------------------------------------------------
               |               Strong
               |      Coef.   Std. Err.     z    P>|z|  [95% Conf. Interval]
---------------+------------------------------------------------------------
spend <-       |
         x1    |   .5189993   .0161283   32.18   0.000   .4873884   .5506102
         x2    |  -.4755281     .02257  -21.07   0.000  -.5197646  -.4312917
         x4    |   .5300193   .0187114   28.33   0.000   .4933455    .566693
      _cons    |   .4849085   .0288667   16.80   0.000   .4283308   .5414862
---------------+------------------------------------------------------------
insurance coverage <-   |
            x3 |    .299793   .0084822   35.34   0.000   .2831681   .3164178
            x4 |  -.2835648   .0135266  -20.96   0.000  -.3100765  -.2570531
---------------+------------------------------------------------------------
expenditure <- |
            x5 |  -.2992792   .0192201  -15.57   0.000  -.3369499  -.2616086
            x6 |    .319377   .0483959    6.60   0.000   .2245229   .4142312
            x4 |    .448041   .0252857   17.72   0.000   .3984819   .4976001
         _cons |   1.088217   .0375369   28.99   0.000   1.014646   1.161788
---------------+------------------------------------------------------------
insurance coverage      |
         /cut1 |   -1.28517   .0236876  -54.26   0.000  -1.331596  -1.238743
         /cut2 |  -.2925979   .0216827  -13.49   0.000  -.3350951  -.2501006
         /cut3 |   .7400875   .0230452   32.11   0.000   .6949198   .7852552
----------------------------------------------------------------------------

The estimated probit parameters are within the spend equation. The estimated ordinal-probit parameters are within the insurance coverage equation. The estimated expenditure parameters are within the expenditure equation. We might have obtained these level estimates utilizing probit, oprobit, and poisson. With gsem, we do that collectively and acquire right normal errors when computing marginal results. Within the case of the poisson mannequin, we’re utilizing gsem to acquire an exponential imply and may interpret the outcomes from a quasilikelihood perspective. Due to the quasilikelihood nature of the issue, we use the vce(sturdy) choice.

The typical of the marginal impact of x4 is
start{equation*}
frac{1}{N}sum_{i=1}^N frac{partial hat{E}left(textual content{expenditure}_i|X_i, {tt insurance coverage}_iright)}{partial {tt x4}_i}
finish{equation*}
and we estimate it by


. margins, vce(unconditional) predict(expression(regular(eta(spend))* 
>          regular(eta(insurance coverage)-_b[insurance_cut2:_cons])*          
>          exp(eta(expenditure)))) dydx(x4)

Common marginal results                     Variety of obs     =     10,000

Expression   : Predicted regular(eta(spend))*
               regular(eta(insurance coverage)-_b[insurance_cut2:_cons])* e,
               predict(expression(regular(eta(spend))*
               regular(eta(insurance coverage)-_b[insurance_cut2:_cons])*
               exp(eta(expenditure))))
dy/dx w.r.t. : x4

---------------------------------------------------------------------------
          |            Unconditional
          |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------+----------------------------------------------------------------
       x4 |   .5382276   .0506354    10.63   0.000     .4389841    .6374711
---------------------------------------------------------------------------

We used the expression() choice to jot down an expression for the anticipated worth of curiosity and predict() and eta() to indicate the linear predictions for every mannequin. We use the vce(unconditional) choice to permit the covariates to be random as a substitute of mounted. In different phrases, we’re estimating a inhabitants impact as a substitute of a pattern impact.

Ultimate issues

We illustrated tips on how to use gsem to acquire the estimates and normal errors for a a number of hurdle mannequin and its marginal impact. In subsequent posts, we are going to get hold of these outcomes utilizing different Stata instruments.

Appendix

Beneath is the code used to provide the information.


clear
set seed 111
set obs 10000
// Producing exogenous variables
generate x1 = rnormal()
generate x2 = int(3*rbeta(2,3))
generate x3 = rchi2(1)-2
generate x4 = ln(rchi2(4))
generate x5 = rnormal()
generate x6 = rbeta(2,3)>.6
// Producing unobservables
generate ep = rnormal() // for probit
generate eo = rnormal() // for ordered probit
generate e  = rnormal() // for lognormal equation
// Producing linear predictions
generate xbp = .5*(1 + x1 - x2 + x4)
generate xbo = .3*(1 + x3 - x4)
generate xbe = .3*(1 - x5 + x6 + x4)
// Producing outcomes
generate spend       = xbp + ep > 0
generate yotemp      = xbo + eo
generate insurance coverage   = yotemp
generate yexp = exp(xbe + e)
exchange insurance coverage = 1 if yotemp < -1
exchange insurance coverage = 2 if yotemp> -1 & yotemp<0
exchange insurance coverage = 3 if yotemp> 0 & yotemp <1
exchange insurance coverage = 4 if yotemp>1
generate expenditure = spend*insurance coverage*yexp



Related Articles

Latest Articles