Programming an estimation command in Stata: Utilizing optimize() to estimate Poisson parameters

February 14, 2026

5

(
newcommand{xb}{{bf x}}
newcommand{betab}{boldsymbol{beta}})I present methods to use optimize() in Mata to maximise a Poisson log-likelihood operate and to acquire estimators of the variance–covariance of the estimator (VCE) primarily based on impartial and identically distributed (IID) observations or on strong strategies.

That is the eighteenth publish within the collection Programming an estimation command in Stata. I like to recommend that you simply begin initially. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this collection.

Utilizing optimize()

There are various non-compulsory decisions that one could make when fixing a nonlinear optimization drawback, however there are only a few that one should make. The optimize*() capabilities in Mata deal with this drawback by making a set of default decisions for you, requiring that you simply specify just a few issues, and permitting you to alter any of the default decisions.

Once I use optimize() to unravel a nonlinear optimization drawback, I do 4 steps.

I create an optimize() object

: S = optimize_init()

which comprises all of the default decisions
I take advantage of among the optimize_init_*(S) capabilities to place details about my optimization drawback into S.
I take advantage of

: betahat = optimize(S)

to carry out the optimization.
I take advantage of among the optimize_result_*(S) capabilities to get the outcomes, which optimize(S) saved in S.

Think about maximizing the log-likelihood operate of a Poisson mannequin. The contribution of the (i)th remark to the log-likelihood is

[
f_i(betab) = y_i{bf x_i}betab – exp({bf x}_ibetab’) – ln( y_i !)
]

the place (y_i) is the dependent variable, ({bf x}_i) is the vector of covariates, and (betab) is the row vector of parameters that we choose to maximise the log-likelihood operate given by (F(betab) =sum_i f_i(betab)). I may drop (ln(y_i!)), as a result of it doesn’t depend upon the parameters. I embrace it to make the worth of the log-likelihood operate the identical as that reported by Stata. Stata contains these phrases in order that the values of the log-likelihood capabilities are comparable throughout fashions.

The code block 1 copies the info from Stata to Mata and computes the Poisson log-likelihood operate on the vector of parameter values b, which has been set to the arbitrary beginning values of .01 for every parameter.

Code block 1: An evaluator operate for the Poisson log-likelihood


clear all
use accident3
mata:
     y = st_data(., "accidents")
     X = st_data(., "cvalue youngsters visitors")
     X = X,J(rows(X), 1, 1)
     b = J(1, cols(X), .01)
    xb = X*b'
    f  = sum(-exp(xb) + y:*xb - lnfactorial(y))
finish

The Mata operate plleval() in code block 2 places the worth of the Poisson log-likelihood operate on the vector of parameter values b into val.

Code block 2: An evaluator operate for the Poisson log-likelihood


mata:
void plleval(actual scalar todo, actual vector b, val, grad, hess)
{
    actual vector  y, xb
    actual matrix  X

     y = st_data(., "accidents")
     X = st_data(., "cvalue youngsters visitors")
     X = X,J(rows(X), 1, 1)
    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
finish

plleval() has the default syntax of an evaluator operate that optimize() can name. Evaluator capabilities have a default syntax in order that optimize() can name them, which it should do to seek out the utmost. After describing the default syntax, I’ll present methods to use evaluators with additional arguments.

plleval() is void, it returns nothing. The actual scalar todo permits optimize() to inform the evaluator operate what it should compute. The actual vector b is the present worth of the parameter vector. val is just not typed as a result of it doesn’t matter what it comprises on enter, it’ll comprise the worth of the target operate on output. grad is just not typed as a result of it’ll optionally comprise the vector of first derivatives of the target operate on the present worth of b on output. hess is just not typed as a result of it’ll optionally comprise the matrix of second derivatives of the target operate on the present worth of b on output. As plleval() illustrates, the target operate should put the worth of the target operate into the third argument, nevertheless it needn’t compute both the vector of first derivatives or the matrix of second derivatives.

In instance 1, I take advantage of optimize() to maximise the Poisson log-likelihood operate computed in plleval().

Instance 1: Utilizing optimize() to estimate Poisson parameters

(Makes use of accident3.dta)


. clear all

. use accident3

. mata:
------------------------------------------------- mata (sort finish to exit) ------
: void plleval(actual scalar todo, actual vector b, val, grad, hess)
> {
>     actual vector  y, xb
>     actual matrix  X
> 
>      y = st_data(., "accidents")
>      X = st_data(., "cvalue youngsters visitors")
>      X = X,J(rows(X), 1, 1)
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
observe: argument todo unused
observe: argument grad unused
observe: argument hess unused

: 
: S  = optimize_init()

: optimize_init_evaluator(S, &plleval())

: optimize_init_params(S, J(1, 4, .01))

: bh = optimize(S)
Iteration 0:   f(p) = -851.18669  
Iteration 1:   f(p) = -556.66874  
Iteration 2:   f(p) = -555.81708  
Iteration 3:   f(p) = -555.81538  
Iteration 4:   f(p) = -555.81538  

: bh
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
    +---------------------------------------------------------+

: finish
--------------------------------------------------------------------------------

After defining plleval(), I take advantage of optimize_init() to create the optimize() object S. I need to put details about methods to name plleval() and the vector of beginning values into S. Typing

optimize_init_evaluator(S, &plleval())

places the deal with of the evaluator operate plleval() into S. By previous the identify of the evaluator operate plleval() with an ampersand (&), I put the deal with of the evaluator operate into S. optimize() requires that you simply put the operate deal with as a substitute of the operate identify as a result of having the deal with hastens discovering the operate. Typing

optimize_init_params(S, J(1, 4, .01))

places the vector of beginning values, J(1, 4, .01), into S.

Typing

bh = optimize(S)

causes optimize() to unravel the optimization drawback described in S, and it causes optimize() to place the vector of optimum parameters in bh. optimize() produces the default iteration log, as a result of we didn’t change the default specification in S.

When optimize() has accomplished, the outcomes are in S. For instance, I show the bh returned by optimize() and use optimize_result_params(S) to show the consequence saved in S. I additional illustrate by displaying the usual errors; optimize_result_V_oim() retrieves the observed-information-matrix (OIM) estimator of the variance–covariance of the estimator (VCE). Many different outcomes are saved in S; sort assist mf optimize and have a look at the optimize_result*() capabilities for particulars.

Evaluating the ends in examples 1 and a couple of exhibits that they’re right.

Instance 2: Outcomes from poisson


. poisson accidents cvalue youngsters visitors

Iteration 0:   log chance = -555.86605
Iteration 1:   log chance =  -555.8154
Iteration 2:   log chance = -555.81538

Poisson regression                              Variety of obs     =        505
                                                LR chi2(3)        =     340.20
                                                Prob > chi2       =     0.0000
Log chance = -555.81538                     Pseudo R2         =     0.2343

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .0706484    -9.28   0.000    -.7943553   -.5174188
        youngsters |  -1.009017   .0807961   -12.49   0.000    -1.167374   -.8506594
     visitors |   .1467115   .0313762     4.68   0.000     .0852153    .2082076
       _cons |    .574354   .2839515     2.02   0.043     .0178193    1.130889
------------------------------------------------------------------------------

plleval() is gradual as a result of it copies the info from Stata to Mata each time optimize() calls it. I might a lot fairly go the info to the evaluator operate, however this requires placing details about the syntax of the brand new evaluator operate in S. For instance, I wish to use the evaluator operate plleval2(). In instance 3, I take advantage of optimize_init_argument() to place data into S in regards to the additional arguments accepted by the brand new evaluator operate plleval2().

Code block 3: Passing information to the Poisson evaluator operate


mata:
void plleval2(actual scalar todo, actual vector b,     ///
              actual vector y,    actual matrix X,     ///
              val, grad, hess)
{
    actual vector  xb

    xb = X*b'
   val = sum(-exp(xb) + y:*xb - lnfactorial(y))
}
finish

Line 3 declares the additional arguments, the actual vector y, and the actual vector X. The additional arguments come between the inputs that should at all times be current, the actual scalar todo and the actual vector b, and the always-present outputs; val, grad, and hess.

Instance 3 makes use of optimize() to maximise the Poisson goal operate coded in plleval2().

Instance 3: Utilizing non-compulsory arguments to go information


. mata:
------------------------------------------------- mata (sort finish to exit) ------
: void plleval2(actual scalar todo, actual vector b,     ///
>               actual vector y,    actual matrix X,     ///
>               val, grad, hess)
> {
>     actual vector  xb
>
>     xb = X*b'
>    val = sum(-exp(xb) + y:*xb - lnfactorial(y))
> }
observe: argument todo unused
observe: argument grad unused
observe: argument hess unused

:
: y = st_data(., "accidents")

: X = st_data(., "cvalue youngsters visitors")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval2())

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81708
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871399   -1.009017051    .1467114648    .5743542793  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483931   .0807960852   .0313761961   .2839519366  |
    +---------------------------------------------------------+

: finish
--------------------------------------------------------------------------------

After defining plleval2(), I copy the info from Stata to Mata, and I take advantage of optimize_init() to place the default decisions into the optimize() object S. Once I typed

optimize_init_argument(S, 1, y)

I put data into S specifying that optimize() ought to go y as the primary additional argument to the evaluator operate. Once I typed

optimize_init_argument(S, 2, X)

I put data into S specifying that optimize() ought to go X because the second additional argument to the evaluator operate.

Analogous to instance 1, typing

optimize_init_evaluator(S, &plleval2())

places the deal with of plleval2() into S, and typing

optimize_init_params(S, J(1, 4, .01))

places the vector of beginning values, J(1, 4, .01), in S.

The outcomes are the identical as these in instance 1.

Vector of remark–stage contributions and strong VCE estimation

Sturdy estimators for the VCE of an estimator use the construction of observation-level contributions; see Wooldridge (2010, chapters 12 and 13) or Cameron and Trivedi (2005, chapter 5). When the evaluator operate offers optimize() a vector of observation-level contributions, as a substitute of a scalar summation, optimize() can use this construction to compute strong or cluster–strong estimators of the VCE.

Think about plleval3(), which places the vector of observation-level contributions into val.

Code block 4: A vector of observation-level contributions


mata:
void plleval3(actual scalar todo, actual vector b,     ///
              actual vector y,    actual matrix X,     ///
              val, grad, hess)
{
    actual vector  xb

    xb = X*b'
   val = -exp(xb) + y:*xb - lnfactorial(y)
}
finish

To make use of plleval3(), I need to put data within the optimize() object stating that the evaluator operate computes a vector of observation-level contributions. In instance 4, I take advantage of optimize_init_evaluatortype() to place this data into the optimize() object S.

Instance 4: Sturdy VCE estimation


. mata:
------------------------------------------------- mata (sort finish to exit) ------
: void plleval3(actual scalar todo, actual vector b,     ///
>               actual vector y,    actual matrix X,     ///
>               val, grad, hess)
> {
>     actual vector  xb
>
>     xb = X*b'
>    val = -exp(xb) + y:*xb - lnfactorial(y)
> }
observe: argument todo unused
observe: argument grad unused
observe: argument hess unused

:
:
: y = st_data(., "accidents")

: X = st_data(., "cvalue youngsters visitors")

: X = X,J(rows(X), 1, 1)

:
: S  = optimize_init()

: optimize_init_argument(S, 1, y)

: optimize_init_argument(S, 2, X)

: optimize_init_evaluator(S, &plleval3())

: optimize_init_evaluatortype(S, "gf0")

: optimize_init_params(S, J(1, 4, .01))

:
: bh = optimize(S)
Iteration 0:   f(p) = -851.18669
Iteration 1:   f(p) = -556.66874
Iteration 2:   f(p) = -555.81731
Iteration 3:   f(p) = -555.81538
Iteration 4:   f(p) = -555.81538

: optimize_result_params(S)
                  1              2              3              4
    +-------------------------------------------------------------+
  1 |  -.6558871527   -1.009017051    .1467114658    .5743542978  |
    +-------------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_oim(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .0706483832   .0807960809    .031376176   .2839517337  |
    +---------------------------------------------------------+

: sqrt(diagonal(optimize_result_V_robust(S)))'
                 1             2             3             4
    +---------------------------------------------------------+
  1 |  .1096020124    .188666044    .092431746   .6045057623  |
    +---------------------------------------------------------+

: finish
--------------------------------------------------------------------------------

After defining plleval3(), I copy the info, create the optimize() object S, put the specs for the additional arguments y and X in S, and put the deal with of plleval3() into S. Typing

optimize_init_evaluatortype(S, “gf0”)

places in S the knowledge that the evaluator operate returns a vector of observation-level contribution and that it computes zero derivatives, that’s the evaluator operate is sort “gf0”. Given the vector construction, I can sort

optimize_result_V_robust(S)

to compute a strong estimator of the VCE.

sqrt(diagonal(optimize_result_V_robust(S)))’

returns the strong customary errors, that are the identical as these reported by poisson in instance 5.

Instance 5: Sturdy VCE estimation by poisson


. poisson accidents cvalue youngsters visitors, vce(strong)

Iteration 0:   log pseudolikelihood = -555.86605
Iteration 1:   log pseudolikelihood =  -555.8154
Iteration 2:   log pseudolikelihood = -555.81538

Poisson regression                              Variety of obs     =        505
                                                Wald chi2(3)      =      99.76
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -555.81538               Pseudo R2         =     0.2343

------------------------------------------------------------------------------
             |               Sturdy
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      cvalue |  -.6558871   .1096019    -5.98   0.000    -.8707029   -.4410712
        youngsters |  -1.009017    .188666    -5.35   0.000    -1.378795   -.6392382
     visitors |   .1467115   .0924316     1.59   0.112    -.0344512    .3278741
       _cons |    .574354   .6045047     0.95   0.342    -.6104535    1.759162
------------------------------------------------------------------------------

Carried out and undone

I confirmed methods to use optimize() to maximise a Poisson log-likelihood operate. I additionally confirmed methods to get hold of a strong estimator of the VCE by coding the evaluator operate to compute a vector of observation-level contributions. In my subsequent publish, I present methods to write a Stata command that makes use of Mata to estimate the parameters of a Poisson regression mannequin.

References

Cameron, A. C., and P. Ok. Trivedi. 2005. Microeconometrics: Strategies and Purposes. Cambridge: Cambridge College Press.

Wooldridge, J. M. 2010. Econometric Evaluation of Cross Part and Panel Information. 2nd ed. Cambridge, Massachusetts: MIT Press.

Programming an estimation command in Stata: Utilizing optimize() to estimate Poisson parameters

Related Articles

ALS stole this musician’s voice. AI let him sing once more.

One menace actor chargeable for 83% of current Ivanti RCE assaults

Greatest Apple Watch (2026): Sequence 11, SE 3, and Extremely 3

Latest Articles

ALS stole this musician’s voice. AI let him sing once more.

One menace actor chargeable for 83% of current Ivanti RCE assaults

Greatest Apple Watch (2026): Sequence 11, SE 3, and Extremely 3

Closing out my tabs: Valentines Day version

Accomplished Hyperparameter Switch throughout Modules, Width, Depth, Batch and Period