Saturday, March 21, 2026

Utilizing gsem to mix estimation outcomes


gsem is a really versatile command that enables us to suit very refined fashions. Nonetheless, additionally it is helpful in conditions that contain easy fashions.

For instance, once we wish to examine parameters amongst two or extra fashions, we normally use suest, which mixes the estimation outcomes beneath one parameter vector and creates a simultaneous covariance matrix of the strong sort. This covariance estimate is described within the Strategies and formulation of [R] suest because the strong variance from a “stacked mannequin”. Really, gsem can estimate these sorts of “stacked fashions”, even when the estimation samples usually are not the identical and ultimately overlap. Through the use of the choice vce(strong), we are able to replicate the outcomes from suest if the fashions can be found for gsem. As well as, gsem permits us to mix outcomes from some estimation instructions that aren’t supported by suest, like fashions together with random results.

 

Instance: Evaluating parameters from two fashions

 

Let’s take into account the childweight dataset, described in [ME] combined. Think about the next fashions, the place weights of girls and boys are modeled utilizing the age and the age-squared:


. webuse childweight, clear
(Weight information on Asian youngsters)

. regress  weight age c.age#c.age if lady == 0, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.985022   .6343855    12.59   0.000     6.725942    9.244101
             |
 c.age#c.age |   -1.74346   .2374504    -7.34   0.000    -2.214733   -1.272187
             |
       _cons |   3.684363   .3217223    11.45   0.000     3.045833    4.322893
------------------------------------------------------------------------------

. regress  weight age c.age#c.age if lady == 1, noheader
------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.008066   .5164687    13.57   0.000     5.982746    8.033386
             |
 c.age#c.age |  -1.450582   .1930318    -7.51   0.000    -1.833798   -1.067365
             |
       _cons |   3.480933   .2616616    13.30   0.000     2.961469    4.000397
------------------------------------------------------------------------------

To check whether or not birthweights are the identical for the 2 teams, we have to check whether or not the intercepts within the two regressions are the identical. Utilizing suest, we’d proceed as follows:


. quietly regress weight age c.age#c.age if lady == 0, noheader

. estimates retailer boys

. quietly regress weight age c.age#c.age if lady == 1, noheader

. estimates retailer women

. suest boys women

Simultaneous outcomes for boys, women

                                                  Variety of obs   =        198

------------------------------------------------------------------------------
             |               Sturdy
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022   .4678417    17.07   0.000     7.068069    8.901975
             |
 c.age#c.age |   -1.74346   .2034352    -8.57   0.000    -2.142186   -1.344734
             |
       _cons |   3.684363   .1719028    21.43   0.000      3.34744    4.021286
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289   .1870822     2.55   0.011     .1103546    .8437032
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066   .4166916    16.82   0.000     6.191365    7.824766
             |
 c.age#c.age |  -1.450582   .1695722    -8.55   0.000    -1.782937   -1.118226
             |
       _cons |   3.480933   .1556014    22.37   0.000      3.17596    3.785906
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127   .1351769     0.07   0.943    -.2552292    .2746545
------------------------------------------------------------------------------

Invoking an estimation command with the choice coeflegend will give us a legend we are able to use to seek advice from the parameters once we use postestimation instructions like check.


. suest, coeflegend

Simultaneous outcomes for boys, women

                                                  Variety of obs   =        198

------------------------------------------------------------------------------
             |      Coef.  Legend
-------------+----------------------------------------------------------------
boys_mean    |
         age |   7.985022  _b[boys_mean:age]
             |
 c.age#c.age |   -1.74346  _b[boys_mean:c.age#c.age]
             |
       _cons |   3.684363  _b[boys_mean:_cons]
-------------+----------------------------------------------------------------
boys_lnvar   |
       _cons |   .4770289  _b[boys_lnvar:_cons]
-------------+----------------------------------------------------------------
girls_mean   |
         age |   7.008066  _b[girls_mean:age]
             |
 c.age#c.age |  -1.450582  _b[girls_mean:c.age#c.age]
             |
       _cons |   3.480933  _b[girls_mean:_cons]
-------------+----------------------------------------------------------------
girls_lnvar  |
       _cons |   .0097127  _b[girls_lnvar:_cons]
------------------------------------------------------------------------------

. check  _b[boys_mean:_cons] = _b[girls_mean:_cons]

 ( 1)  [boys_mean]_cons - [girls_mean]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

We discover no proof that the intercepts are completely different.

Now, let’s replicate these outcomes through the use of the gsem command. We generate the variable weightboy, a duplicate of weight for boys and lacking in any other case, and the variable weightgirl, a duplicate of weight for ladies and lacking in any other case.


. quietly generate weightboy = weight if lady == 0

. quietly generate weightgirl = weight if lady == 1

. gsem (weightboy <- age c.age#c.age) (weightgirl <- age c.age#c.age), ///
>      nolog vce(strong)

Generalized structural equation mannequin             Variety of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |              Sturdy
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  .4678417   17.07  0.000     7.068069    8.901975
                 |
     c.age#c.age |   -1.74346  .2034352   -8.57  0.000    -2.142186   -1.344734
                 |
           _cons |   3.684363  .1719028   21.43  0.000      3.34744    4.021286
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  .4166916   16.82  0.000     6.191365    7.824766
                 |
     c.age#c.age |  -1.450582  .1695722   -8.55  0.000    -1.782937   -1.118226
                 |
           _cons |   3.480933  .1556014   22.37  0.000      3.17596    3.785906
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  .3014028                    1.071012    2.280821
var(e.weightgirl)|    .978849  .1364603                    .7448187    1.286414
-------------------------------------------------------------------------------

. gsem, coeflegend

Generalized structural equation mannequin             Variety of obs   =        198
Log pseudolikelihood =  -302.2308

-------------------------------------------------------------------------------
                 |      Coef.  Legend
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   7.985022  _b[weightboy:age]
                 |
     c.age#c.age |   -1.74346  _b[weightboy:c.age#c.age]
                 |
           _cons |   3.684363  _b[weightboy:_cons]
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.008066  _b[weightgirl:age]
                 |
     c.age#c.age |  -1.450582  _b[weightgirl:c.age#c.age]
                 |
           _cons |   3.480933  _b[weightgirl:_cons]
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   1.562942  _b[var(e.weightboy):_cons]
var(e.weightgirl)|    .978849  _b[var(e.weightgirl):_cons]
-------------------------------------------------------------------------------

. check  _b[weightgirl:_cons]=  _b[weightboy:_cons]

 ( 1)  - [weightboy]_cons + [weightgirl]_cons = 0

           chi2(  1) =    0.77
         Prob > chi2 =    0.3803

gsem allowed us to suit fashions on completely different subsets concurrently. By default, the mannequin is assumed to be a linear regression, however a number of hyperlinks and households can be found; for instance, you’ll be able to mix two Poisson fashions or a multinomial logistic mannequin with an everyday logistic mannequin. See [SEM] sem and gsem for particulars.

Right here, I exploit the vce(strong) choice to duplicate the outcomes for suest. Nonetheless, when estimation samples don’t overlap, outcomes from each estimations are assumed to be unbiased, and thus the choice vce(strong) isn’t wanted. When performing the estimation with out the vce(strong) choice, the joint covariance matrix will include two blocks with the covariances from the unique fashions and 0s exterior these blocks.

 

An instance with random results

 

The childweight dataset accommodates repeated measures, and it’s, within the documentation, analyzed used the combined command, which permits us to account for the intra-individual correlation through random results.

Now, let’s use the strategies described above to mix outcomes from two random-effects fashions. Listed here are the 2 separate fashions:


. combined weight age c.age#c.age if lady == 0 || id:, nolog

Combined-effects ML regression                     Variety of obs      =       100
Group variable: id                              Variety of teams   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   1070.28
Log probability = -149.05479                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   8.328882   .4601093    18.10   0.000     7.427084    9.230679
             |
 c.age#c.age |  -1.859798   .1722784   -10.80   0.000    -2.197458   -1.522139
             |
       _cons |   3.525929   .2723617    12.95   0.000      2.99211    4.059749
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Id                 |
                  var(_cons) |   .7607779   .2439115      .4058409    1.426133
-----------------------------+------------------------------------------------
               var(Residual) |   .7225673   .1236759      .5166365    1.010582
------------------------------------------------------------------------------
LR check vs. linear regression: chibar2(01) =    30.34 Prob >= chibar2 = 0.0000

. combined weight age c.age#c.age if lady == 1 || id:, nolog

Combined-effects ML regression                     Variety of obs      =        98
Group variable: id                              Variety of teams   =        34

                                                Obs per group: min =         1
                                                               avg =       2.9
                                                               max =         5


                                                Wald chi2(2)       =   2141.72
Log probability =  -114.3008                     Prob > chi2        =    0.0000

------------------------------------------------------------------------------
      weight |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   7.273082   .3167266    22.96   0.000     6.652309    7.893854
             |
 c.age#c.age |  -1.538309    .118958   -12.93   0.000    -1.771462   -1.305156
             |
       _cons |   3.354834   .2111793    15.89   0.000      2.94093    3.768738
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Id                 |
                  var(_cons) |   .6925554   .1967582       .396848    1.208606
-----------------------------+------------------------------------------------
               var(Residual) |   .3034231   .0535359      .2147152    .4287799
------------------------------------------------------------------------------
LR check vs. linear regression: chibar2(01) =    47.42 Prob >= chibar2 = 0.0000

Random results will be included in a gsem mannequin by incorporating latent variables on the group stage; these are the latent variables M1[id] and M2[id] under. By default, gsem will attempt to estimate a covariance when it sees two latent variables on the identical stage. This may be simply solved by proscribing this covariance time period to 0. Possibility vce(strong) must be used each time we wish to produce the mechanism utilized by suest.


. gsem (weightboy <- age c.age#c.age M1[id])   ///
>      (weightgirl <- age c.age#c.age M2[id]), ///
>      cov(M1[id]*M2[id]@0) vce(strong) nolog

Generalized structural equation mannequin             Variety of obs   =        198
Log pseudolikelihood = -263.35559

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weightgirl]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                 |              Sturdy
                 |      Coef.  Std. Err.     z   P>|z|     [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <-     |
             age |   8.328882  .4211157   19.78  0.000      7.50351    9.154253
                 |
     c.age#c.age |  -1.859798  .1591742  -11.68  0.000    -2.171774   -1.547823
                 |
          M1[id] |          1 (constrained)
                 |
           _cons |   3.525929  .1526964   23.09  0.000      3.22665    3.825209
-----------------+-------------------------------------------------------------
weightgirl <-    |
             age |   7.273082  .3067378   23.71  0.000     6.671887    7.874277
                 |
     c.age#c.age |  -1.538309   .120155  -12.80  0.000    -1.773808    -1.30281
                 |
          M2[id] |          1 (constrained)
                 |
           _cons |   3.354834  .1482248   22.63  0.000     3.064319     3.64535
-----------------+-------------------------------------------------------------
      var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
      var(M2[id])|   .6925553  .1850283                    .4102429    1.169144
-----------------+-------------------------------------------------------------
 var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
var(e.weightgirl)|   .3034231  .0667975                    .1970877    .4671298
-------------------------------------------------------------------------------

Above, now we have the joint output from the 2 fashions, which might permit us to carry out assessments amongst parameters in each fashions. Discover that choice vce(strong) implies that normal errors shall be clustered on the teams decided by id.

gsem, when referred to as with the vce(strong) choice, will complain if there are inconsistencies among the many teams within the fashions (for instance, if the random results in each fashions had been crossed).

 

Checking that you’re becoming the identical mannequin

 

Within the earlier mannequin, gsem‘s default covariance construction included a time period that wasn’t within the authentic two fashions, so we would have liked to incorporate an extra restriction. This may be straightforward to identify in a easy mannequin, however for those who don’t wish to rely simply on a visible inspection, you’ll be able to write a small loop to make it possible for all of the estimates within the joint mannequin are literally additionally within the authentic fashions.

Let’s see an instance with random results, this time with overlapping information.


. *match first mannequin and save the estimates
. gsem (weightboy <- age c.age#c.age M1[id]), nolog

Generalized structural equation mannequin             Variety of obs   =        100
Log probability = -149.05479

 ( 1)  [weightboy]M1[id] = 1
-------------------------------------------------------------------------------
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4609841   18.07   0.000     7.425369    9.232394
                |
    c.age#c.age |  -1.859798  .1725233  -10.78   0.000    -2.197938   -1.521659
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .2726322   12.93   0.000      2.99158    4.060279
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2439114                     .4058407    1.426132
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1236759                     .5166366    1.010582
-------------------------------------------------------------------------------

. mat b1 = e(b)

. *match second mannequin and save the estimates
. gsem (weight <- age M2[id]), nolog

Generalized structural equation mannequin             Variety of obs   =        198
Log probability = -348.32402

 ( 1)  [weight]M2[id] = 1
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight <-    |
         age |   3.389281   .1152211    29.42   0.000     3.163452    3.615111
             |
      M2[id] |          1  (constrained)
             |
       _cons |   5.156913   .1803059    28.60   0.000      4.80352    5.510306
-------------+----------------------------------------------------------------
  var(M2[id])|   .6076662   .2040674                      .3146395    1.173591
-------------+----------------------------------------------------------------
var(e.weight)|   1.524052   .1866496                      1.198819    1.937518
------------------------------------------------------------------------------

. mat b2 = e(b)

. *stack estimates from first and second fashions
. mat stacked = b1, b2

. *estimate joint mannequin and save outcomes
. gsem (weightboy <- age c.age#c.age M1[id]) ///
>      (weight <- age M2[id]), cov(M1[id]*M2[id]@0) vce(strong) nolog

Generalized structural equation mannequin             Variety of obs   =        198
Log pseudolikelihood = -497.37881

 ( 1)  [weightboy]M1[id] = 1
 ( 2)  [weight]M2[id] = 1
                                      (Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
                |              Sturdy
                |      Coef.  Std. Err.     z    P>|z|     [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <-    |
            age |   8.328882  .4211157   19.78   0.000      7.50351    9.154253
                |
    c.age#c.age |  -1.859798  .1591742  -11.68   0.000    -2.171774   -1.547823
                |
         M1[id] |          1 (constrained)
                |
          _cons |   3.525929  .1526964   23.09   0.000      3.22665    3.825209
----------------+--------------------------------------------------------------
weight <-       |
            age |   3.389281  .1157835   29.27   0.000      3.16235    3.616213
                |
         M2[id] |          1 (constrained)
                |
          _cons |   5.156913  .1345701   38.32   0.000      4.89316    5.420665
----------------+--------------------------------------------------------------
     var(M1[id])|   .7607774  .2255575                     .4254915    1.360268
     var(M2[id])|   .6076662     .1974                     .3214791    1.148623
----------------+--------------------------------------------------------------
var(e.weightboy)|   .7225674  .1645983                     .4623572    1.129221
   var(e.weight)|   1.524052  .1705637                     1.223877    1.897849
-------------------------------------------------------------------------------

. mat b = e(b)

. *confirm that estimates from the joint mannequin are the identical as
. *from fashions 1 and a pair of
. native stripes : colfullnames(b)

. foreach l of native stripes{
  2.    matrix  r1 =  b[1,"`l'"]
  3.    matrix r2 = stacked[1,"`l'"]
  4.    assert reldif(el(r1,1,1), el(r2,1,1))<1e-5
  5. }

The loop above verifies that every one the labels within the second mannequin correspond to estimates within the first and that the estimates are literally the identical. If you happen to omit the restriction for the variance within the joint mannequin, the assert command will produce an error.

 

Technical observe

 

As documented in [U] 20.21.2 Correlated errors: Cluster-robust normal errors, the method for the strong estimator of the variance is

[
V_{robust} = hat V(sum_{j=1}^N u’_ju_j) hat V
]

the place (N) is the variety of observations, (hat V) is the traditional estimator of the variance, and for every remark (j), (u_j) is a row vector (with as many columns as parameters), which represents the contribution of this remark to the gradient. (If we stack the rows (u_j), the columns of this matrix are the scores.)

Once we apply suest, the matrix (hat V) is constructed because the stacked block-diagonal typical variance estimates from the unique submodels; that is the variance you will note for those who apply gsem to the joint mannequin with out the vce(strong) choice. The (u_j) values utilized by suest are actually the values from each estimations, so now we have as many (u_j) values because the sum of observations within the two authentic fashions and every row accommodates as many columns as the whole variety of parameters in each fashions. That is the precise operation that gsem, vce(strong) does.

When random results are current, normal errors shall be clustered on teams. As an alternative of observation-level contributions to the gradient, we’d use cluster-level contributions. Which means observations within the two fashions would should be clustered in a constant method; observations which might be frequent to the 2 estimations would should be in the identical cluster within the two estimations.



Related Articles

Latest Articles