gsem is a really versatile command that enables us to suit very refined fashions. Nonetheless, additionally it is helpful in conditions that contain easy fashions.
For instance, once we wish to examine parameters amongst two or extra fashions, we normally use suest, which mixes the estimation outcomes beneath one parameter vector and creates a simultaneous covariance matrix of the strong sort. This covariance estimate is described within the Strategies and formulation of [R] suest because the strong variance from a “stacked mannequin”. Really, gsem can estimate these sorts of “stacked fashions”, even when the estimation samples usually are not the identical and ultimately overlap. Through the use of the choice vce(strong), we are able to replicate the outcomes from suest if the fashions can be found for gsem. As well as, gsem permits us to mix outcomes from some estimation instructions that aren’t supported by suest, like fashions together with random results.
Instance: Evaluating parameters from two fashions
Let’s take into account the childweight dataset, described in [ME] combined. Think about the next fashions, the place weights of girls and boys are modeled utilizing the age and the age-squared:
. webuse childweight, clear
(Weight information on Asian youngsters)
. regress weight age c.age#c.age if lady == 0, noheader
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 7.985022 .6343855 12.59 0.000 6.725942 9.244101
|
c.age#c.age | -1.74346 .2374504 -7.34 0.000 -2.214733 -1.272187
|
_cons | 3.684363 .3217223 11.45 0.000 3.045833 4.322893
------------------------------------------------------------------------------
. regress weight age c.age#c.age if lady == 1, noheader
------------------------------------------------------------------------------
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 7.008066 .5164687 13.57 0.000 5.982746 8.033386
|
c.age#c.age | -1.450582 .1930318 -7.51 0.000 -1.833798 -1.067365
|
_cons | 3.480933 .2616616 13.30 0.000 2.961469 4.000397
------------------------------------------------------------------------------
To check whether or not birthweights are the identical for the 2 teams, we have to check whether or not the intercepts within the two regressions are the identical. Utilizing suest, we’d proceed as follows:
. quietly regress weight age c.age#c.age if lady == 0, noheader
. estimates retailer boys
. quietly regress weight age c.age#c.age if lady == 1, noheader
. estimates retailer women
. suest boys women
Simultaneous outcomes for boys, women
Variety of obs = 198
------------------------------------------------------------------------------
| Sturdy
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
boys_mean |
age | 7.985022 .4678417 17.07 0.000 7.068069 8.901975
|
c.age#c.age | -1.74346 .2034352 -8.57 0.000 -2.142186 -1.344734
|
_cons | 3.684363 .1719028 21.43 0.000 3.34744 4.021286
-------------+----------------------------------------------------------------
boys_lnvar |
_cons | .4770289 .1870822 2.55 0.011 .1103546 .8437032
-------------+----------------------------------------------------------------
girls_mean |
age | 7.008066 .4166916 16.82 0.000 6.191365 7.824766
|
c.age#c.age | -1.450582 .1695722 -8.55 0.000 -1.782937 -1.118226
|
_cons | 3.480933 .1556014 22.37 0.000 3.17596 3.785906
-------------+----------------------------------------------------------------
girls_lnvar |
_cons | .0097127 .1351769 0.07 0.943 -.2552292 .2746545
------------------------------------------------------------------------------
Invoking an estimation command with the choice coeflegend will give us a legend we are able to use to seek advice from the parameters once we use postestimation instructions like check.
. suest, coeflegend
Simultaneous outcomes for boys, women
Variety of obs = 198
------------------------------------------------------------------------------
| Coef. Legend
-------------+----------------------------------------------------------------
boys_mean |
age | 7.985022 _b[boys_mean:age]
|
c.age#c.age | -1.74346 _b[boys_mean:c.age#c.age]
|
_cons | 3.684363 _b[boys_mean:_cons]
-------------+----------------------------------------------------------------
boys_lnvar |
_cons | .4770289 _b[boys_lnvar:_cons]
-------------+----------------------------------------------------------------
girls_mean |
age | 7.008066 _b[girls_mean:age]
|
c.age#c.age | -1.450582 _b[girls_mean:c.age#c.age]
|
_cons | 3.480933 _b[girls_mean:_cons]
-------------+----------------------------------------------------------------
girls_lnvar |
_cons | .0097127 _b[girls_lnvar:_cons]
------------------------------------------------------------------------------
. check _b[boys_mean:_cons] = _b[girls_mean:_cons]
( 1) [boys_mean]_cons - [girls_mean]_cons = 0
chi2( 1) = 0.77
Prob > chi2 = 0.3803
We discover no proof that the intercepts are completely different.
Now, let’s replicate these outcomes through the use of the gsem command. We generate the variable weightboy, a duplicate of weight for boys and lacking in any other case, and the variable weightgirl, a duplicate of weight for ladies and lacking in any other case.
. quietly generate weightboy = weight if lady == 0
. quietly generate weightgirl = weight if lady == 1
. gsem (weightboy <- age c.age#c.age) (weightgirl <- age c.age#c.age), ///
> nolog vce(strong)
Generalized structural equation mannequin Variety of obs = 198
Log pseudolikelihood = -302.2308
-------------------------------------------------------------------------------
| Sturdy
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <- |
age | 7.985022 .4678417 17.07 0.000 7.068069 8.901975
|
c.age#c.age | -1.74346 .2034352 -8.57 0.000 -2.142186 -1.344734
|
_cons | 3.684363 .1719028 21.43 0.000 3.34744 4.021286
-----------------+-------------------------------------------------------------
weightgirl <- |
age | 7.008066 .4166916 16.82 0.000 6.191365 7.824766
|
c.age#c.age | -1.450582 .1695722 -8.55 0.000 -1.782937 -1.118226
|
_cons | 3.480933 .1556014 22.37 0.000 3.17596 3.785906
-----------------+-------------------------------------------------------------
var(e.weightboy)| 1.562942 .3014028 1.071012 2.280821
var(e.weightgirl)| .978849 .1364603 .7448187 1.286414
-------------------------------------------------------------------------------
. gsem, coeflegend
Generalized structural equation mannequin Variety of obs = 198
Log pseudolikelihood = -302.2308
-------------------------------------------------------------------------------
| Coef. Legend
-----------------+-------------------------------------------------------------
weightboy <- |
age | 7.985022 _b[weightboy:age]
|
c.age#c.age | -1.74346 _b[weightboy:c.age#c.age]
|
_cons | 3.684363 _b[weightboy:_cons]
-----------------+-------------------------------------------------------------
weightgirl <- |
age | 7.008066 _b[weightgirl:age]
|
c.age#c.age | -1.450582 _b[weightgirl:c.age#c.age]
|
_cons | 3.480933 _b[weightgirl:_cons]
-----------------+-------------------------------------------------------------
var(e.weightboy)| 1.562942 _b[var(e.weightboy):_cons]
var(e.weightgirl)| .978849 _b[var(e.weightgirl):_cons]
-------------------------------------------------------------------------------
. check _b[weightgirl:_cons]= _b[weightboy:_cons]
( 1) - [weightboy]_cons + [weightgirl]_cons = 0
chi2( 1) = 0.77
Prob > chi2 = 0.3803
gsem allowed us to suit fashions on completely different subsets concurrently. By default, the mannequin is assumed to be a linear regression, however a number of hyperlinks and households can be found; for instance, you’ll be able to mix two Poisson fashions or a multinomial logistic mannequin with an everyday logistic mannequin. See [SEM] sem and gsem for particulars.
Right here, I exploit the vce(strong) choice to duplicate the outcomes for suest. Nonetheless, when estimation samples don’t overlap, outcomes from each estimations are assumed to be unbiased, and thus the choice vce(strong) isn’t wanted. When performing the estimation with out the vce(strong) choice, the joint covariance matrix will include two blocks with the covariances from the unique fashions and 0s exterior these blocks.
An instance with random results
The childweight dataset accommodates repeated measures, and it’s, within the documentation, analyzed used the combined command, which permits us to account for the intra-individual correlation through random results.
Now, let’s use the strategies described above to mix outcomes from two random-effects fashions. Listed here are the 2 separate fashions:
. combined weight age c.age#c.age if lady == 0 || id:, nolog
Combined-effects ML regression Variety of obs = 100
Group variable: id Variety of teams = 34
Obs per group: min = 1
avg = 2.9
max = 5
Wald chi2(2) = 1070.28
Log probability = -149.05479 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 8.328882 .4601093 18.10 0.000 7.427084 9.230679
|
c.age#c.age | -1.859798 .1722784 -10.80 0.000 -2.197458 -1.522139
|
_cons | 3.525929 .2723617 12.95 0.000 2.99211 4.059749
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Id |
var(_cons) | .7607779 .2439115 .4058409 1.426133
-----------------------------+------------------------------------------------
var(Residual) | .7225673 .1236759 .5166365 1.010582
------------------------------------------------------------------------------
LR check vs. linear regression: chibar2(01) = 30.34 Prob >= chibar2 = 0.0000
. combined weight age c.age#c.age if lady == 1 || id:, nolog
Combined-effects ML regression Variety of obs = 98
Group variable: id Variety of teams = 34
Obs per group: min = 1
avg = 2.9
max = 5
Wald chi2(2) = 2141.72
Log probability = -114.3008 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
weight | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | 7.273082 .3167266 22.96 0.000 6.652309 7.893854
|
c.age#c.age | -1.538309 .118958 -12.93 0.000 -1.771462 -1.305156
|
_cons | 3.354834 .2111793 15.89 0.000 2.94093 3.768738
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Id |
var(_cons) | .6925554 .1967582 .396848 1.208606
-----------------------------+------------------------------------------------
var(Residual) | .3034231 .0535359 .2147152 .4287799
------------------------------------------------------------------------------
LR check vs. linear regression: chibar2(01) = 47.42 Prob >= chibar2 = 0.0000
Random results will be included in a gsem mannequin by incorporating latent variables on the group stage; these are the latent variables M1[id] and M2[id] under. By default, gsem will attempt to estimate a covariance when it sees two latent variables on the identical stage. This may be simply solved by proscribing this covariance time period to 0. Possibility vce(strong) must be used each time we wish to produce the mechanism utilized by suest.
. gsem (weightboy <- age c.age#c.age M1[id]) ///
> (weightgirl <- age c.age#c.age M2[id]), ///
> cov(M1[id]*M2[id]@0) vce(strong) nolog
Generalized structural equation mannequin Variety of obs = 198
Log pseudolikelihood = -263.35559
( 1) [weightboy]M1[id] = 1
( 2) [weightgirl]M2[id] = 1
(Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
| Sturdy
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-----------------+-------------------------------------------------------------
weightboy <- |
age | 8.328882 .4211157 19.78 0.000 7.50351 9.154253
|
c.age#c.age | -1.859798 .1591742 -11.68 0.000 -2.171774 -1.547823
|
M1[id] | 1 (constrained)
|
_cons | 3.525929 .1526964 23.09 0.000 3.22665 3.825209
-----------------+-------------------------------------------------------------
weightgirl <- |
age | 7.273082 .3067378 23.71 0.000 6.671887 7.874277
|
c.age#c.age | -1.538309 .120155 -12.80 0.000 -1.773808 -1.30281
|
M2[id] | 1 (constrained)
|
_cons | 3.354834 .1482248 22.63 0.000 3.064319 3.64535
-----------------+-------------------------------------------------------------
var(M1[id])| .7607774 .2255575 .4254915 1.360268
var(M2[id])| .6925553 .1850283 .4102429 1.169144
-----------------+-------------------------------------------------------------
var(e.weightboy)| .7225674 .1645983 .4623572 1.129221
var(e.weightgirl)| .3034231 .0667975 .1970877 .4671298
-------------------------------------------------------------------------------
Above, now we have the joint output from the 2 fashions, which might permit us to carry out assessments amongst parameters in each fashions. Discover that choice vce(strong) implies that normal errors shall be clustered on the teams decided by id.
gsem, when referred to as with the vce(strong) choice, will complain if there are inconsistencies among the many teams within the fashions (for instance, if the random results in each fashions had been crossed).
Checking that you’re becoming the identical mannequin
Within the earlier mannequin, gsem‘s default covariance construction included a time period that wasn’t within the authentic two fashions, so we would have liked to incorporate an extra restriction. This may be straightforward to identify in a easy mannequin, however for those who don’t wish to rely simply on a visible inspection, you’ll be able to write a small loop to make it possible for all of the estimates within the joint mannequin are literally additionally within the authentic fashions.
Let’s see an instance with random results, this time with overlapping information.
. *match first mannequin and save the estimates
. gsem (weightboy <- age c.age#c.age M1[id]), nolog
Generalized structural equation mannequin Variety of obs = 100
Log probability = -149.05479
( 1) [weightboy]M1[id] = 1
-------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <- |
age | 8.328882 .4609841 18.07 0.000 7.425369 9.232394
|
c.age#c.age | -1.859798 .1725233 -10.78 0.000 -2.197938 -1.521659
|
M1[id] | 1 (constrained)
|
_cons | 3.525929 .2726322 12.93 0.000 2.99158 4.060279
----------------+--------------------------------------------------------------
var(M1[id])| .7607774 .2439114 .4058407 1.426132
----------------+--------------------------------------------------------------
var(e.weightboy)| .7225674 .1236759 .5166366 1.010582
-------------------------------------------------------------------------------
. mat b1 = e(b)
. *match second mannequin and save the estimates
. gsem (weight <- age M2[id]), nolog
Generalized structural equation mannequin Variety of obs = 198
Log probability = -348.32402
( 1) [weight]M2[id] = 1
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight <- |
age | 3.389281 .1152211 29.42 0.000 3.163452 3.615111
|
M2[id] | 1 (constrained)
|
_cons | 5.156913 .1803059 28.60 0.000 4.80352 5.510306
-------------+----------------------------------------------------------------
var(M2[id])| .6076662 .2040674 .3146395 1.173591
-------------+----------------------------------------------------------------
var(e.weight)| 1.524052 .1866496 1.198819 1.937518
------------------------------------------------------------------------------
. mat b2 = e(b)
. *stack estimates from first and second fashions
. mat stacked = b1, b2
. *estimate joint mannequin and save outcomes
. gsem (weightboy <- age c.age#c.age M1[id]) ///
> (weight <- age M2[id]), cov(M1[id]*M2[id]@0) vce(strong) nolog
Generalized structural equation mannequin Variety of obs = 198
Log pseudolikelihood = -497.37881
( 1) [weightboy]M1[id] = 1
( 2) [weight]M2[id] = 1
(Std. Err. adjusted for clustering on id)
-------------------------------------------------------------------------------
| Sturdy
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
----------------+--------------------------------------------------------------
weightboy <- |
age | 8.328882 .4211157 19.78 0.000 7.50351 9.154253
|
c.age#c.age | -1.859798 .1591742 -11.68 0.000 -2.171774 -1.547823
|
M1[id] | 1 (constrained)
|
_cons | 3.525929 .1526964 23.09 0.000 3.22665 3.825209
----------------+--------------------------------------------------------------
weight <- |
age | 3.389281 .1157835 29.27 0.000 3.16235 3.616213
|
M2[id] | 1 (constrained)
|
_cons | 5.156913 .1345701 38.32 0.000 4.89316 5.420665
----------------+--------------------------------------------------------------
var(M1[id])| .7607774 .2255575 .4254915 1.360268
var(M2[id])| .6076662 .1974 .3214791 1.148623
----------------+--------------------------------------------------------------
var(e.weightboy)| .7225674 .1645983 .4623572 1.129221
var(e.weight)| 1.524052 .1705637 1.223877 1.897849
-------------------------------------------------------------------------------
. mat b = e(b)
. *confirm that estimates from the joint mannequin are the identical as
. *from fashions 1 and a pair of
. native stripes : colfullnames(b)
. foreach l of native stripes{
2. matrix r1 = b[1,"`l'"]
3. matrix r2 = stacked[1,"`l'"]
4. assert reldif(el(r1,1,1), el(r2,1,1))<1e-5
5. }
The loop above verifies that every one the labels within the second mannequin correspond to estimates within the first and that the estimates are literally the identical. If you happen to omit the restriction for the variance within the joint mannequin, the assert command will produce an error.
Technical observe
As documented in [U] 20.21.2 Correlated errors: Cluster-robust normal errors, the method for the strong estimator of the variance is
[
V_{robust} = hat V(sum_{j=1}^N u’_ju_j) hat V
]
the place (N) is the variety of observations, (hat V) is the traditional estimator of the variance, and for every remark (j), (u_j) is a row vector (with as many columns as parameters), which represents the contribution of this remark to the gradient. (If we stack the rows (u_j), the columns of this matrix are the scores.)
Once we apply suest, the matrix (hat V) is constructed because the stacked block-diagonal typical variance estimates from the unique submodels; that is the variance you will note for those who apply gsem to the joint mannequin with out the vce(strong) choice. The (u_j) values utilized by suest are actually the values from each estimations, so now we have as many (u_j) values because the sum of observations within the two authentic fashions and every row accommodates as many columns as the whole variety of parameters in each fashions. That is the precise operation that gsem, vce(strong) does.
When random results are current, normal errors shall be clustered on teams. As an alternative of observation-level contributions to the gradient, we’d use cluster-level contributions. Which means observations within the two fashions would should be clustered in a constant method; observations which might be frequent to the 2 estimations would should be in the identical cluster within the two estimations.
