Tuesday, March 10, 2026

Programming estimators in Stata: Why it’s best to


Distributing a Stata command that implements a statistical methodology will get that methodology utilized by numerous individuals. They are going to thanks. And, they are going to cite you!

This put up is the primary within the sequence #StataProgramming about programing an estimation command in Stata that makes use of Mata to do the numerical work. Within the technique of displaying you easy methods to program an estimation command in Stata, I’ll focus on do-file programming, ado-file programming, and Mata programming. When the sequence ends, it is possible for you to to jot down Stata instructions.

Stata customers like its predictable syntax and its estimation-postestimation construction that facilitates speculation testing, specification assessments, and parameter interpretation. That can assist you write Stata instructions that individuals need to use, I illustrate how Stata syntax is predictable and provides an summary of the estimation-postestimation construction that it would be best to emulate in your applications.

Stata construction by instance

I use and describe some simulated knowledge in regards to the variety of site visitors accidents noticed on 948 individuals.

Instance 1: Accident knowledge


. use http://www.stata.com/knowledge/accident2.dta

. describe

Accommodates knowledge from http://www.stata.com/knowledge/accident2.dta
  obs:           948                          
 vars:             6                          23 Sep 2015 13:04
 measurement:        22,752                          
--------------------------------------------------------------------------------
              storage   show    worth
variable identify   sort    format     label      variable label
--------------------------------------------------------------------------------
youngsters            float   %9.0g                 variety of kids
cvalue          float   %9.0g                 automobile worth index
tickets         float   %9.0g                 variety of tickets in final 2 years
site visitors         float   %9.0g                 native site visitors index, bigger=>worse
male            float   %9.0g                 1=>man, 0=>girl
accidents       float   %9.0g                 variety of site visitors in final 5 years
--------------------------------------------------------------------------------
Sorted by: 

Stata’s predictable syntax

I estimate the parameters of a Poisson regression mannequin for accidents as a operate of site visitors situations (site visitors), an indicator for being a male driver (male), and the variety of tickets obtained within the final two years (tickets).

Instance 2: A Poisson mannequin for accidents


. poisson accidents site visitors male tickets , vce(strong)

Iteration 0:   log pseudolikelihood = -377.98594  
Iteration 1:   log pseudolikelihood = -370.68001  
Iteration 2:   log pseudolikelihood = -370.66527  
Iteration 3:   log pseudolikelihood = -370.66527  

Poisson regression                              Variety of obs     =        948
                                                Wald chi2(3)      =    1798.65
                                                Prob > chi2       =     0.0000
Log pseudolikelihood = -370.66527               Pseudo R2         =     0.8191

------------------------------------------------------------------------------
             |               Sturdy
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     site visitors |   .0764399   .0165119     4.63   0.000     .0440772    .1088027
        male |   3.228004   .1232081    26.20   0.000     2.986521    3.469488
     tickets |   1.366614   .0328218    41.64   0.000     1.302284    1.430943
       _cons |  -7.434478   .2413188   -30.81   0.000    -7.907454   -6.961502
------------------------------------------------------------------------------

I need to give attention to the construction on this instance as a way to use it to make your instructions simpler to make use of. Specifically, I need to focus on the construction of the command syntax and to level out that the output is simple to learn and interpret as a result of it’s a customary Stata output desk. For estimators that desk virtually all the time studies estimates (typically coefficients), customary errors, assessments towards zero and their $p$-values, and confidence intervals.

Stata syntax is predictable, which makes it straightforward to make use of. Stata customers “converse Stata” and don’t even discover the small print. I spotlight a few of these particulars in order that we are able to make the syntax of the instructions we write predictable. Listed here are a few of the customary syntax components illustrated in instance 2.

  1. The command has 4 syntactical components;
    1. command identify (poisson),
    2. record of variable names (accidents site visitors male tickets),
    3. a comma,
    4. an choice (vce(strong)).
  2. Within the record of variable names, the identify of the dependent variable is first and it’s adopted by the names of the impartial variables.
  3. The job of the comma is to separate the command identify and variable record from the choice or choices.

The output can also be structured; it’s composed of an iteration log, a header, and a regular output desk.

Estimation-postestimation framework

As a Stata person, I might now use the estimation-postestimation framework. For instance, I might carry out a Wald check of the speculation that the coefficient on male is 3.

Instance 3: A Wald check of a linear restriction


. check male = 3

 ( 1)  [accidents]male = 3

           chi2(  1) =    3.42
         Prob > chi2 =    0.0642

or I might carry out a Wald check of the nonlinear speculation that the ratio of the coefficient on male to the ratio of the coefficient on tickets is 2.

Instance 4: A Wald check of a nonlinear restriction


. testnl _b[male]/_b[tickets] = 2

  (1)  _b[male]/_b[tickets] = 2

               chi2(1) =       19.65
           Prob > chi2 =        0.0000

I might additionally predict the imply of accidents for every remark and summarize the outcomes.

Instance 5: Summarizing the anticipated conditional means


. predict nhat
(choice n assumed; predicted variety of occasions)

. summarize nhat

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        nhat |        948    .8512658    2.971087   .0006086    29.0763

Lastly, I might use margins to estimate conditional or population-averaged parameters which are features of the parameters within the unique mannequin. I take advantage of margins to estimate the common variety of accidents that may be noticed if every particular person obtained 0 tickets, or 1 ticket, or 2 tickets, …, or 7 tickets. See [R] margins, Lengthy and Freese (2006, sec. 4.4.2-4.4.3), and Cameron and Trivedi (2010, 10.5.6{10.6.9) for introductions to estimating features of the the mannequin parameters by margins.

Instance 6: Estimating features of mannequin parameters


. margins, at(tickets=(0 1 2 3 4 5 6 7))

Predictive margins                              Variety of obs     =        948
Mannequin VCE    : Sturdy

Expression   : Predicted variety of occasions, predict()

1._at        : tickets         =           0

2._at        : tickets         =           1

3._at        : tickets         =           2

4._at        : tickets         =           3

5._at        : tickets         =           4

6._at        : tickets         =           5

7._at        : tickets         =           6

8._at        : tickets         =           7

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   .0097252   .0015387     6.32   0.000     .0067094     .012741
          2  |   .0381426   .0048762     7.82   0.000     .0285854    .0476998
          3  |   .1495971   .0148157    10.10   0.000      .120559    .1786353
          4  |   .5867272   .0432256    13.57   0.000     .5020066    .6714478
          5  |   2.301172   .1302033    17.67   0.000     2.045978    2.556366
          6  |   9.025308   .5049176    17.87   0.000     8.035688    10.01493
          7  |   35.39769   2.555679    13.85   0.000     30.38865    40.40673
          8  |   138.8315   13.49606    10.29   0.000     112.3797    165.2832
------------------------------------------------------------------------------

The glue

The estimation outcomes saved in e() are the glue that holds collectively the estimation-postestimation framework. The poisson command shops numerous stuff in e(). I might use ereturn record to record all these items, however there are lots of saved objects that don’t curiosity you but.

Many of the estimation-postestimation options that I mentioned had been applied utilizing e(b), e(V), and e(predict), that are the vector of level estimates, the estimated VCE, and the identify of the command that implements predict after poisson.

I’ll present easy methods to retailer what you want in e() within the #StataProgramming sequence.

Construction of Stata instructions

Right here is a top level view of the duties carried out by a Stata estimation command.

  1. Parse the enter to the command.
  2. Compute outcomes.
  3. Retailer ends in e()
  4. Show output.

It’s good to write a predict command to finish the estimation-postestimation framework. After you will have saved the estimation outcomes and written the predict command, margins works.

I’ll clarify every of those steps within the #StataProgramming sequence of posts.

Use this construction to your benefit. To make your command straightforward to make use of, design it to have the predictable syntax applied in different instructions and make it work within the estimation-postestimation framework. This job is way simpler than it sounds. In truth, it’s simply plain straightforward. The Stata language steers you on this route.

Finished and undone

I’ll educate you easy methods to program an estimation command in Stata within the #StataProgramming sequence. I can even present you ways do the numerical work in Mata. I mentioned the next factors, on this first put up.

  1. The predictable construction of Stata syntax makes Stata straightforward to make use of. It’s best to emulate this construction, in order that your instructions are straightforward to make use of.
  2. The estimation-postestimation framework makes inference and superior estimation easy. It’s straightforward so that you can make your command work with this framework.
  3. The estimation outcomes saved in e(), and the predict command, are the glue that holds the estimation-postestimation framework collectively.

Within the subsequent put up, I focus on do-file programming instruments that I’ll subsequently use to parse the enter to the command.

References

Cameron, A. C., and P. Okay. Trivedi. 2010. Microeconometrics Utilizing Stata. Revised ed. Faculty Station, Texas: Stata Press.

Lengthy, J. S., and J. Freese. 2014. Regression fashions for categorical dependent variables utilizing Stata. third ed. Faculty Station, Texas: Stata Press.



Related Articles

Latest Articles