The brand new command gsem permits us to suit all kinds of fashions; among the many many potentialities, we are able to account for endogeneity on totally different fashions. For example, I’ll match an ordinal mannequin with endogenous covariates.
Parameterizations for an ordinal probit mannequin
The ordinal probit mannequin is used to mannequin ordinal dependent variables. Within the regular parameterization, we assume that there’s an underlying linear regression, which relates an unobserved steady variable (y^*) to the covariates (x).
[y^*_{i} = x_{i}gamma + u_i]
The noticed dependent variable (y) pertains to (y^*) by a sequence of cut-points (-infty =kappa_0<kappa_1<dots< kappa_m=+infty) , as follows:
[y_{i} = j {mbox{ if }} kappa_{j-1} < y^*_{i} leq kappa_j]
Offered that the variance of (u_i) can’t be recognized from the noticed knowledge, it’s assumed to be equal to 1. Nevertheless, we are able to think about a re-scaled parameterization for a similar mannequin; an easy method of seeing this, is by noting that, for any optimistic quantity (M):
[kappa_{j-1} < y^*_{i} leq kappa_j iff
Mkappa_{j-1} < M y^*_{i} leq Mkappa_j
]
that’s,
[kappa_{j-1} < x_igamma + u_i leq kappa_j iff
Mkappa_{j-1}< x_i(Mgamma) + Mu_i leq Mkappa_j
]
In different phrases, if the mannequin is recognized, it may be represented by multiplying the unobserved variable (y) by a optimistic quantity, and it will imply that the usual error of the residual part, the coefficients, and the cut-points might be multiplied by this quantity.
Let me present you an instance; I’ll first match a regular ordinal probit mannequin, each with oprobit and with gsem. Then, I’ll use gsem to suit an ordinal probit mannequin the place the residual time period for the underlying linear regression has a regular deviation equal to 2. I’ll do that by introducing a latent variable (L), with variance 1, and coefficient (sqrt 3). This might be added to the underlying latent residual, with variance 1; then, the ‘new’ residual time period may have variance equal to (1+((sqrt 3)^2times Var(L))= 4), so the usual deviation might be 2. We’ll see that in consequence, the coefficients, in addition to the cut-points, might be multiplied by 2.
. sysuse auto, clear
(1978 Car Information)
. oprobit rep mpg disp , nolog
Ordered probit regression Variety of obs = 69
LR chi2(2) = 14.68
Prob > chi2 = 0.0006
Log probability = -86.352646 Pseudo R2 = 0.0783
------------------------------------------------------------------------------
rep78 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858
displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252
-------------+----------------------------------------------------------------
/cut1 | -1.570496 1.146391 -3.81738 .6763888
/cut2 | -.7295982 1.122361 -2.929386 1.47019
/cut3 | .6580529 1.107838 -1.513269 2.829375
/cut4 | 1.60884 1.117905 -.5822132 3.799892
------------------------------------------------------------------------------
. gsem (rep <- mpg disp, oprobit), nolog
Generalized structural equation mannequin Variety of obs = 69
Log probability = -86.352646
--------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <- |
mpg | .0497185 .0355452 1.40 0.162 -.0199487 .1193858
displacement | -.0029884 .0021498 -1.39 0.165 -.007202 .0012252
---------------+----------------------------------------------------------------
rep78 |
/cut1 | -1.570496 1.146391 -1.37 0.171 -3.81738 .6763888
/cut2 | -.7295982 1.122361 -0.65 0.516 -2.929386 1.47019
/cut3 | .6580529 1.107838 0.59 0.553 -1.513269 2.829375
/cut4 | 1.60884 1.117905 1.44 0.150 -.5822132 3.799892
--------------------------------------------------------------------------------
. native a = sqrt(3)
. gsem (rep <- mpg disp L@`a'), oprobit var(L@1) nolog
Generalized structural equation mannequin Variety of obs = 69
Log probability = -86.353008
( 1) [rep78]L = 1.732051
( 2) [var(L)]_cons = 1
--------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
rep78 <- |
mpg | .099532 .07113 1.40 0.162 -.0398802 .2389442
displacement | -.0059739 .0043002 -1.39 0.165 -.0144022 .0024544
L | 1.732051 (constrained)
---------------+----------------------------------------------------------------
rep78 |
/cut1 | -3.138491 2.293613 -1.37 0.171 -7.63389 1.356907
/cut2 | -1.456712 2.245565 -0.65 0.517 -5.857938 2.944513
/cut3 | 1.318568 2.21653 0.59 0.552 -3.02575 5.662887
/cut4 | 3.220004 2.236599 1.44 0.150 -1.16365 7.603657
---------------+----------------------------------------------------------------
var(L)| 1 (constrained)
--------------------------------------------------------------------------------
Ordinal probit mannequin with endogenous covariates
This mannequin is outlined analogously to the mannequin fitted by -ivprobit- for probit fashions with endogenous covariates; we assume an underlying mannequin with two equations,
[
begin{eqnarray}
y^*_{1i} =& y_{2i} beta + x_{1i} gamma + u_i &
y_{2i} =& x_{1i} pi_1 + x_{2i} pi_2 + v_i & ,,,,,, (1)
end{eqnarray}
]
the place (u_i sim N(0, 1) ), (v_isim N(0,s^2) ), and (corr(u_i, v_i) = rho).
We don’t observe (y^*_{1i}); as a substitute, we observe a discrete variable (y_{1i}), corresponding to, for a set of cut-points (to be estimated) (kappa_0 = -infty < kappa_1 < kappa_2 dots < kappa_m = +infty ),
[y_{1i} = j {mbox{ if }} kappa_{j-1} < y^*_{1i} leq kappa_j ]
The parameterization we’ll use
I’ll re-scale the primary equation, preserving the correlation. That’s, I’ll think about the next system:
[
begin{eqnarray}
z^*_{1i} =&
y_{2i}b +x_{1i}c + t_i + alpha L_i &
y_{2i} = &x_{1i}pi_1 + x_{2i}pi_2 + w_i + alpha L_i & ,,,,,, (2)
end{eqnarray}
]
the place (t_i, w_i, L_i) are impartial, (t_i sim N(0, 1)) , (w_i sim N(0,sigma^2)), (L_i sim N(0, 1))
[y_{1i} = j {mbox{ if }} lambda_{j-1} < z^*_{1i} leq lambda_j ]
By introducing a latent variable in each equations, I’m modeling a correlation between the error phrases. The fist equation is a re-scaled model of the unique equation, that’s, (z^*_1 = My^*_1),
[ y_{2i}b +x_{1i}c + t_i + alpha_i L_i
= M(y_{2i}beta) +M x_{1i}gamma + M u_i ]
This suggests that
[M u_i = t_i + alpha_i L_i, ]
the place (Var(u_i) = 1) and (Var(t_i + alpha L_i) = 1 + alpha^2), so the size is (M = sqrt{1+alpha^2} ).
The second equation stays the identical, we simply categorical (v_i) as (w_i + alpha L_i). Now, after estimating the system (2), we are able to recuperate the parameters in (1) as follows:
[beta = frac{1}{sqrt{1+ alpha^2}} b]
[gamma = frac{1}{sqrt{1+ alpha^2}} c]
[kappa_j = frac{1}{sqrt{1+ alpha^2}} lambda_j ]
[V(v_i) = V(w_i + alpha L_i) =V(w_i) + alpha^2].
[rho = Cov(t_i + alpha L_i, w_i + alpha L_i) =
frac{alpha^2}{(sqrt{1+alpha^2}sqrt{V(w_i)+alpha^2)}}]
Be aware: This parameterization assumes that the correlation is optimistic; for destructive values of the correlation, (L) must be included within the second equation with a destructive signal (that’s, L@(-a) as a substitute of L@a). When attempting to carry out the estimation with the unsuitable signal, the mannequin most probably gained’t obtain convergence. In any other case, you will notice a coefficient for L that’s just about zero. In Stata 13.1 now we have included options that help you match the mannequin with out this restriction. Nevertheless, this time we’ll use the older parameterization, which can help you visualize the totally different elements extra simply.
Simulating knowledge, and performing the estimation
clear
set seed 1357
set obs 10000
forvalues i = 1(1)5 {
gen x`i' =2* rnormal() + _n/1000
}
mat C = [1,.5 .5, 1]
drawnorm z1 z2, cov(C)
gen y2 = 0
forvalues i = 1(1)5 {
exchange y2 = y2 + x`i'
}
exchange y2 = y2 + z2
gen y1star = y2 + x1 + x2 + z1
gen xb1 = y2 + x1 + x2
gen y1 = 4
exchange y1 = 3 if xb1 + z1 <=.8
exchange y1 = 2 if xb1 + z1 <=.3
exchange y1 = 1 if xb1 + z1 <=-.3
exchange y1 = 0 if xb1 + z1 <=-.8
gsem (y1 <- y2 x1 x2 L@a, oprobit) (y2 <- x1 x2 x3 x4 x5 L@a), var(L@1)
native y1 y1
native y2 y2
native xaux x1 x2 x3 x4 x5
native xmain y2 x1 x2
native s2 sqrt(1+_b[`y1':L]^2)
foreach v in `xmain'{
native trans `trans' (`y1'_`v': _b[`y1':`v']/`s2')
}
foreach v in `xaux' _cons {
native trans `trans' (`y2'_`v': _b[`y2':`v'])
}
qui tab `y1' if e(pattern)
native ncuts = r(r)-1
forvalues i = 1(1) `ncuts'{
native trans `trans' (cut_`i': _b[`y1'_cut`i':_cons]/`s2')
}
native s1 sqrt( _b[var(e.`y2'):_cons] +_b[`y1':L]^2)
native trans `trans' (sig_2: `s1')
native trans `trans' (rho_12: _b[`y1':L]^2/(`s1'*`s2'))
nlcom `trans'
Outcomes
That is the output from gsem:
Generalized structural equation mannequin Variety of obs = 10000
Log probability = -14451.117
( 1) [y1]L - [y2]L = 0
( 2) [var(L)]_cons = 1
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1 <- |
y2 | 1.379511 .0775028 17.80 0.000 1.227608 1.531414
x1 | 1.355687 .0851558 15.92 0.000 1.188785 1.522589
x2 | 1.346323 .0833242 16.16 0.000 1.18301 1.509635
L | .7786594 .0479403 16.24 0.000 .6846982 .8726206
-------------+----------------------------------------------------------------
y2 <- |
x1 | .9901353 .0044941 220.32 0.000 .981327 .9989435
x2 | 1.006836 .0044795 224.76 0.000 .998056 1.015615
x3 | 1.004249 .0044657 224.88 0.000 .9954963 1.013002
x4 | .9976541 .0044783 222.77 0.000 .9888767 1.006431
x5 | .9987587 .0044736 223.26 0.000 .9899907 1.007527
L | .7786594 .0479403 16.24 0.000 .6846982 .8726206
_cons | .0002758 .0192417 0.01 0.989 -.0374372 .0379887
-------------+----------------------------------------------------------------
y1 |
/cut1 | -1.131155 .1157771 -9.77 0.000 -1.358074 -.9042358
/cut2 | -.5330973 .1079414 -4.94 0.000 -.7446585 -.321536
/cut3 | .2722794 .1061315 2.57 0.010 .0642654 .4802933
/cut4 | .89394 .1123013 7.96 0.000 .6738334 1.114047
-------------+----------------------------------------------------------------
var(L)| 1 (constrained)
-------------+----------------------------------------------------------------
var(e.y2)| .3823751 .074215 .2613848 .5593696
------------------------------------------------------------------------------
These are the outcomes we receive after we remodel the values reported by gsem to the unique parameterization:
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y1_y2 | 1.088455 .0608501 17.89 0.000 .9691909 1.207719
y1_x1 | 1.069657 .0642069 16.66 0.000 .943814 1.195501
y1_x2 | 1.062269 .0619939 17.14 0.000 .940763 1.183774
y2_x1 | .9901353 .0044941 220.32 0.000 .981327 .9989435
y2_x2 | 1.006836 .0044795 224.76 0.000 .998056 1.015615
y2_x3 | 1.004249 .0044657 224.88 0.000 .9954963 1.013002
y2_x4 | .9976541 .0044783 222.77 0.000 .9888767 1.006431
y2_x5 | .9987587 .0044736 223.26 0.000 .9899907 1.007527
y2__cons | .0002758 .0192417 0.01 0.989 -.0374372 .0379887
cut_1 | -.892498 .0895971 -9.96 0.000 -1.068105 -.7168909
cut_2 | -.4206217 .0841852 -5.00 0.000 -.5856218 -.2556217
cut_3 | .2148325 .0843737 2.55 0.011 .0494632 .3802018
cut_4 | .705332 .0905974 7.79 0.000 .5277644 .8828997
sig_2 | .9943267 .007031 141.42 0.000 .9805462 1.008107
rho_12 | .4811176 .0477552 10.07 0.000 .3875191 .574716
------------------------------------------------------------------------------
The estimates are fairly near the values used for the simulation. When you attempt to carry out the estimation with the unsuitable signal for the coefficient for L, you’ll get a quantity that’s just about zero (when you get convergence in any respect). On this case, the evaluator is telling us that the most effective worth it could actually discover, offered the restrictions now we have imposed, is zero. When you see such outcomes, you might wish to strive the other signal. If each give a zero coefficient, it signifies that that is the answer, and there may be not endogeneity in any respect. If one in all them is just not zero, it signifies that the non-zero worth is the answer. As acknowledged earlier than, in Stata 13.1, the mannequin may be fitted with out this restriction.
