Tuesday, February 10, 2026

Testing mannequin specification and utilizing this system model of gmm


This put up was written collectively with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

The command gmm is used to estimate the parameters of a mannequin utilizing the generalized methodology of moments (GMM). GMM can be utilized to estimate the parameters of fashions which have extra identification situations than parameters, overidentified fashions. The specification of those fashions will be evaluated utilizing Hansen’s J statistic (Hansen, 1982).

We use gmm to estimate the parameters of a Poisson mannequin with an endogenous regressor. Extra devices than regressors can be found, so the mannequin is overidentified. We then use estat overid to calculate Hansen’s J statistic and take a look at the validity of the overidentification restrictions.

In earlier posts (see Estimating parameters by most chance and methodology of moments utilizing mlexp and gmm and Understanding the generalized methodology of moments (GMM): A easy instance), the interactive model of gmm has been used to estimate easy single-equation fashions. For extra complicated fashions, it may be simpler to make use of the moment-evaluator program model of gmm. We display methods to use this model of gmm.

Poisson mannequin with endogenous regressors

On this put up, the Poisson regression of (y_i) on exogenous ({bf x}_i) and endogenous ({bf y}_i) has the shape
start{equation*}
E(y_i vert {bf x}_i,{bf y}_{2,i},epsilon_i)= exp({boldsymbol beta}_1{bf x}_i + {boldsymbol beta}_2{bf y}_{2,i}) + epsilon_i
finish{equation*}
the place (epsilon_i) is a zero-mean error time period. The endogenous regressors ({bf y}_{2,i}) could also be correlated with (epsilon_i). This is identical formulation utilized by ivpoisson with additive errors; see [R] ivpoisson for extra particulars. For extra data on Poisson fashions with endogenous regressors, see Mullahy (1997), Cameron and Trivedi (2013), Windmeijer and Santos Silva (1997), and Wooldridge (2010).

Second situations are anticipated values that specify the mannequin parameters by way of the true moments. GMM finds the parameter values which might be closest to satisfying the pattern equal of the second situations. On this mannequin, we outline second situations utilizing an error perform,
start{equation*}
u_i({boldsymbol beta}_1,{boldsymbol beta}_2) = y_i – exp({boldsymbol beta}_1{bf x}_i + {boldsymbol beta}_2{bf y}_{2,i})
finish{equation*}

Let ({bf x}_{2,i}) be extra exogenous variables. These should not correlated with (epsilon_i), however are correlated with ({bf y}_{2,i}). Combining them with ({bf x}_i), we’ve the devices ({bf z}_i = (start{matrix} {bf x}_{i} & {bf x}_{2,i}finish{matrix})). So the second situations are
start{equation*}
E({bf z}_i u_i({boldsymbol beta}_1,{boldsymbol beta}_2)) = {bf 0}
finish{equation*}

Suppose there are (okay) parameters in ({boldsymbol beta}_1) and ({boldsymbol beta}_2) and (q) devices. When (q>okay), there are extra second situations than parameters. The mannequin is overidentified. Right here GMM finds parameter estimates that remedy weighted second situations. GMM minimizes
[
Q({{boldsymbol beta}_1},{boldsymbol beta}_2) = left{frac{1}{N}sumnolimits_i {{bf z}}_i
u_i({boldsymbol beta}_1,{boldsymbol beta}_2)right}
{bf W}
left{frac{1}{N}sumnolimits_i {{bf z}}_i u_i({boldsymbol beta}_1,{boldsymbol beta}_2)right}’
]
for (qtimes q) weight matrix ({bf W}).

Overidentification take a look at

When the mannequin is accurately specified,
start{equation*}
E({bf z}_i u_i({boldsymbol beta}_1,{boldsymbol beta}_2)) = {bf 0}
finish{equation*}

On this case, if ({bf W}) is an optimum weight matrix, it is the same as the inverse of the covariance matrix of the second situations. Right here we’ve
[
{bf W}^{-1} = E{{bf z}_i’ u_{i}({boldsymbol beta}_1,{boldsymbol beta}_2)
u_{i}({boldsymbol beta}_1,{boldsymbol beta}_2) {bf z}_i}
]

Hansen’s take a look at evaluates the null speculation that an overidentified mannequin is accurately specified. The take a look at statistic (J = N Q(hat{boldsymbol beta}_1, hat{boldsymbol beta}_2)) is used. If ({bf W}) is an optimum weight matrix, beneath the null speculation, Hansen’s J statistic has a (chi^2(q-k)) distribution.

The 2-step and iterated estimators utilized by gmm present estimates of the optimum W. For overidentified fashions, the estat overid command calculates Hansen’s J statistic after these estimators are used.

Second-evaluator program

We outline a program that may be known as by gmm in calculating the second situations for Poisson fashions with endogenous regressors. See Programming an estimation command in Stata: A map to posted entries for extra details about programming in Stata. This system calculates the error perform (u_i), and gmm generates the second situations by multiplying by the devices ({bf z}_i).

To resolve the weighted second situations, gmm should take the spinoff of the second situations with respect to the parameters. Utilizing the chain rule, these are the derivatives of the error features multiplied by the devices. Customers could specify these derivatives themselves, or gmm will calculate the derivatives numerically. Customers can acquire pace and numeric stability by correctly specifying the derivatives themselves.

When linear types of the parameters are estimated, customers could specify derivatives to gmm by way of the linear type (prediction). The chain rule is then utilized by gmm to find out the derivatives of the error perform (u_i) with respect to the parameters. Our error perform (u_i) is a perform of the linear prediction ({boldsymbol beta}_1{bf x}_i + {boldsymbol beta}_2{bf y}_{2,i}).

This system gmm_ivpois calculates the error perform (u_i) and the spinoff of (u_i) by way of the linear prediction ({boldsymbol beta}_1{bf x}_i + {boldsymbol beta}_2{bf y}_{2,i}).


program gmm_ivpois
    model 14.1
    syntax varlist [if], at(title) depvar(varlist) rhs(varlist) ///
           [derivatives(varlist)]
    tempvar m
    quietly gen double `m' = 0 `if'
    native i = 1
    foreach var of varlist `rhs' {
        quietly substitute `m' = `m' + `var'*`at'[1,`i'] `if'
        native i = `i' + 1
    }
    quietly substitute `m' = `m' + `at'[1,`i'] `if'
    quietly substitute `varlist' = `depvar' - exp(`m') `if'
    if "`derivatives'" == "" {
         exit
    }
    substitute `derivatives' = -exp(`m')
finish

Strains 3–4 of gmm_ivpois comprise the syntax assertion that parses the arguments to this system. All moment-evaluator applications should settle for a varlist, the if situation, and the at() possibility. The varlist corresponds to variables that retailer the values of the error features. This system gmm_ivpois will calculate the error perform and retailer it within the specified varlist. The at() possibility is specified with the title of a matrix that accommodates the mannequin parameters. The if situation specifies the observations for which estimation is carried out.

This system additionally requires the choices depvar() and rhs(). The title of the dependent variable is specified within the depvar() possibility. The regressors are specified within the rhs() possibility.

On line 4, derivatives() is optionally available. The variable title specified right here corresponds to the spinoff of the error perform with respect to the linear prediction.

The linear prediction of the regressors is saved within the momentary variable m over strains 6–12. On line 13, we give the worth of the error perform to the desired varlist. Strains 14–16 enable this system to exit if derivatives() shouldn’t be specified. In any other case, on line 17, we retailer the worth of the spinoff of the error perform with respect to the linear prediction within the variable laid out in derivatives().

The info

We simulate information from a Poisson regression with an endogenous covariate, after which we use gmm and the gmm_ivpois program to estimate the parameters of the regression. We’ll then use estat overid to examine the specification of the mannequin. We simulate a random pattern of three,000 observations.


. set seed  45

. set obs 3000
variety of observations (_N) was 0, now 3,000

. generate x = rnormal()*.8 + .5

. generate z = rchi2(1)

. generate w = rnormal()*.5

. matrix cm = (1, .9  .9, 1)

. matrix sd = (.5,.8)

. drawnorm e u, corr(cm) sd(sd)

We generate the exogenous covariates (x), (z), and (w). The variable (x) will probably be a regressor, whereas (z) and (w) will probably be additional devices. Then we use drawnorm to attract the errors (e) and (u). The errors are positively correlated.


. generate y2 = exp(.2*x + .1*z + .3*w -1 + u)

. generate y = exp(.5*x + .2*y2+1) + e

We generate the endogenous regressor (y2) as a lognormal regression on the devices. The end result of curiosity (y) has an exponential imply on (x) and (y2), with (e) as an additive error. As (e) is correlated with (u), (y2) is correlated with (e).

Estimating the mannequin parameters

Now we use gmm to estimate the parameters of the Poisson regression with endogenous covariates. The title of our moment-evaluator program is listed to the best of gmm. The devices that gmm will use to type the second situations are listed in devices(). We specify the choices depvar() and rhs() with the suitable variables. They are going to be handed on to gmm_ivpois.

The parameters are specified because the linear type y within the parameters() possibility, whereas we specify haslfderivatives to tell gmm that gmm_ivpois offers derivatives of this linear type. The choice nequations() tells gmm what number of error features to anticipate.


. gmm gmm_ivpois, depvar(y) rhs(x y2)             ///
>         haslfderivatives devices(x z w)     ///                            
>         parameters({y: x y2 _cons}) nequations(1)

Step 1
Iteration 0:   GMM criterion Q(b) =  14.960972
Iteration 1:   GMM criterion Q(b) =  3.3038486
Iteration 2:   GMM criterion Q(b) =  .59045217
Iteration 3:   GMM criterion Q(b) =  .00079862
Iteration 4:   GMM criterion Q(b) =  .00001419
Iteration 5:   GMM criterion Q(b) =  .00001418

Step 2
Iteration 0:   GMM criterion Q(b) =   .0000567
Iteration 1:   GMM criterion Q(b) =  .00005648
Iteration 2:   GMM criterion Q(b) =  .00005648

GMM estimation

Variety of parameters =   3
Variety of moments    =   4
Preliminary weight matrix: Unadjusted                 Variety of obs   =      3,000
GMM weight matrix:     Strong

------------------------------------------------------------------------------
             |               Strong
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .5006366   .0033273   150.46   0.000     .4941151     .507158
          y2 |   .2007893   .0075153    26.72   0.000     .1860597    .2155189
       _cons |   1.000717   .0063414   157.81   0.000      .988288    1.013146
------------------------------------------------------------------------------
Devices for equation 1: x z w _cons

Our coefficients are important. Nevertheless, the mannequin may nonetheless be misspecified.

Overidentification take a look at

We use estat overid to compute Hansen’s J statistic.


. estat overid

  Take a look at of overidentifying restriction:

  Hansen's J chi2(1) = .169449 (p = 0.6806)

The J statistic equals 0.17. Along with computing Hansen’s J, estat overid offers a take a look at towards misspecification of the mannequin. On this case, we’ve another instrument than regressor, so the J statistic has a (chi^2(1)) distribution. The chance of acquiring a (chi^2(1)) worth larger than 0.17 is given in parentheses. This chance—the p-value of the take a look at—is giant and so we fail to reject the null speculation that the mannequin is correctly specified.

Conclusion

Now we have demonstrated methods to estimate the parameters of a Poisson regression with an endogenous regressor utilizing the moment-evaluator program model of gmm. Now we have additionally demonstrated methods to use estat overid to check for mannequin misspecification after estimation of an overidentified mannequin in gmm. See [R] gmm and [R] gmm postestimation for extra data.

References

Cameron, A. C., and P. Okay. Trivedi. 2013. Regression Evaluation of Depend Knowledge. 2nd ed. New York: Cambridge College Press.

Hansen, L. P. 1982. Giant pattern properties of generalized methodology of moments estimators. Econometrica 50: 1029–1054.

Mullahy, J. 1997. Instrumental-variable estimation of depend information fashions: Functions to fashions of cigarette smoking conduct. Overview of Economics and Statistics 79: 586–593.

Windmeijer, F., and J. M. C. Santos Silva. 1997. Endogeneity in depend information fashions: An utility to demand for well being care. Journal of Utilized Econometrics 12: 281–294.

Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Knowledge. 2nd ed. Cambridge, MA: MIT Press.



Related Articles

Latest Articles