Monday, March 9, 2026

Programming an estimation command in Stata: The place to retailer your stuff


For those who inform me “I program in Stata”, it makes me blissful, however I have no idea what you imply. Do you write scripts to make your analysis reproducible, or do you write Stata instructions that anybody can use and reuse? Within the collection #StataProgramming, I’ll present you the right way to write your individual instructions, however I begin firstly. Discussing the distinction between scripts and instructions right here introduces some important programming ideas and constructions that I take advantage of to write down scripts and instructions.

That is the second submit within the collection Programming an estimation command in Stata. I like to recommend that you simply begin firstly. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this collection.

Scripts versus instructions

A script is a program that all the time performs the identical duties on the identical inputs and produces precisely the identical outcomes. Scripts in Stata are generally known as do-files and the information containing them finish in .do. For instance, I might write a do-file to

  1. learn within the Nationwide Longitudinal Examine of Youth (NLSY) dataset,
  2. clear the info,
  3. type a pattern for some inhabitants, and
  4. run a bunch of regressions on the pattern.

This construction is on the coronary heart of reproducible analysis; produce the identical outcomes from the identical inputs each time. Do-files have a one-of construction. For instance, I couldn’t one way or the other inform this do-file that I would like it to carry out the analogous duties on the Panel Examine on Earnings Dynamics (PSID). Instructions are reusable applications that take arguments to carry out a process on any information of sure sort. For instance, regress performs peculiar least squares on the desired variables no matter whether or not they come from the NLSY, PSID, or every other dataset. Stata instructions are written within the automated do-file (ado) language; the information containing them finish in .ado. Stata instructions written within the ado language are generally known as ado-commands.

An instance do-file

The instructions in code block 1 are contained within the file doex.do within the present working listing of my pc.

Code block 1: doex.do


// model 1.0.0  04Oct2015 (This line is remark) 
model 14                     // model #.# fixes the model of Stata
use http://www.stata.com/information/accident2.dta
summarize accidents tickets

We execute the instructions by typing do doex which produces

Instance 1: Output from do doex

. do doex

. // model 1.0.0  04Oct2015 (This line is remark) 
. model 14                     // model #.# fixes the model of Stata

. use http://www.stata.com/information/accident2.dta

. summarize accidents tickets

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   accidents |        948    .8512658    2.851856          0         20
     tickets |        948    1.436709    1.849456          0          7

. 
. 
finish of do-file
  1. Line 1 in doex.do is a remark that helps to doc the code however will not be executed by Stata. The // initiates a remark. Something following the // on that line is ignored by Stata.
  2. Within the touch upon line 1, I put a model quantity and the date that I final modified this file. The date and the model assist me hold observe of the modifications that I make as I work on the venture. This data additionally helps me reply questions from others with whom I’ve shared a model of this file.
  3. Line 2 specifies the definition of the Stata language that I take advantage of. Stata modifications over time. Setting the model ensures that the do-file continues to run and that the outcomes don’t change because the Stata language evolves.
  4. Line 3 reads within the accident.dta dataset.
  5. Line 4 summarizes the variables accidents and tickets.

Storing stuff in Stata

Programming in Stata is like placing stuff into bins, making Stata change the stuff within the bins, and getting the modified stuff out of the bins. For instance, code block 2 accommodates the code for doex2.do, whose output I show in instance 2

Code block 2: doex2.do


// model 1.0.0  04Oct2015 (This line is remark) 
model 14                     // model #.# fixes the model of Stata
use http://www.stata.com/information/accident2.dta
generate ln_traffic = ln(visitors)
summarize ln_traffic

Instance 2: Output from do doex2


. do doex2

. // model 1.0.0  04Oct2015 (This line is remark) 
. model 14                     // model #.# fixes the model of Stata

. use http://www.stata.com/information/accident2.dta

. generate ln_traffic = ln(visitors)

. summarize ln_traffic

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
  ln_traffic |        948    1.346907    1.004952  -5.261297   2.302408

. 
. 
finish of do-file

In line 4 of code block 2, I generate the brand new variable ln_traffic which I summarize on line 5. doex2.do makes use of generate to alter what’s within the field ln_traffic and makes use of summarize to get a operate of the modified stuff out of the field. Stata variables are essentially the most continuously used field sort in Stata, however if you end up programming, additionally, you will depend on Stata matrices.

There can solely be one variable named visitors in a Stata dataset and its contents may be considered or modified interactively, by a do-file, or by an ado-file command. Equally, there can solely be one Stata matrix named beta in a Stata session and its contents may be considered or modified interactively, by a do-file, or by an ado-file command. Stata variables and Stata matrices are international bins as a result of there can solely be one Stata variable or Stata matrix in a Stata session and its contents may be considered or modified wherever in a Stata session.

The other of worldwide is native. Whether it is native in Stata, its contents can solely be accessed or modified within the interactive session, in a selected do-file, or a specifically ado-file.

Though I’m discussing do-files for the time being, do not forget that we’re studying strategies to write down instructions. It’s important to know the variations between international bins and native bins to program instructions in Stata. World bins, like variables, might include information that the customers of your command don’t want modified. For instance, a command you write ought to by no means change a person’s variable in a manner that was not requested.

Ranges of Stata

The notion that there are ranges of Stata can assist clarify the distinction between international bins and native bins. Suppose that I run 2 do-files or ado-files. Consider the interactive Stata session as degree 0 of Stata, and consider every do-file or ado-file as being Stata ranges 1 and a couple of. World bins like variables and matrices stay in international reminiscence that may be accessed or modified from a Stata command executed in degree 0, 1, or 2. Native bins can solely be accessed or modified by a Stata command inside a selected degree of Stata. (This description will not be precisely how Stata works, however the particulars about how Stata actually handles ranges should not necessary right here.)

Determine 1 depicts this construction.

Reminiscence by Stata degree

Determine 1 clarifies

  • that instructions executed in any respect Stata ranges can entry and alter the objects in international reminiscence,
  • that solely instructions executed at Stata degree 0 can entry and alter the objects native to Stata degree 0,
  • that solely instructions executed at Stata degree 1 can entry and alter the objects native to Stata degree 1, and
  • that solely instructions executed at Stata degree 2 can entry and alter the objects native to Stata degree 2.

World and native macros: Storing and extracting

Macros are Stata bins that maintain data as characters, also referred to as strings. Stata has each international macros and native macros. World macros are international and native macros are native. World macros may be accessed and altered by a command executed at any Stata degree. Native macros may be accessed and altered solely by a command executed at a particular Stata degree.

The simplest strategy to start to know international macros is to place one thing into a worldwide macro after which to get it again out. Code block 3 accommodates the code for global1.do which shops and the retrieves data from a worldwide macro.

Code block 3: global1.do


// model 1.0.0  04Oct2015 
model 14                     
international vlist "y x1 x2"
show "vlist accommodates $vlist"

Instance 3: Output from do global1


. do global1

. // model 1.0.0  04Oct2015 
. model 14                     

. international vlist "y x1 x2"

. show "vlist accommodates $vlist"
vlist accommodates y x1 x2

. 
finish of do-file

Line 3 of code block 3 places the string y x1 x2 into the worldwide macro named vlist. To extract what I put into a worldwide macro, I prefix the identify of worldwide macro with a $. Line 4 of the code block and its output in instance 3 illustrate this utilization by extracting and displaying the contents of vlist.

Code block 4 accommodates the code for local1.do and its output is given in instance 4. They illustrate the right way to put one thing into a neighborhood macro and the right way to extract one thing from it.

Code block 4: local1.do


// model 1.0.0  04Oct2015 
model 14                     
native vlist "y x1 x2"
show "vlist accommodates `vlist'"

Instance 4: Output from do global1


. do local1

. // model 1.0.0  04Oct2015 
. model 14                     

. native vlist "y x1 x2"

. show "vlist accommodates `vlist'"
vlist accommodates y x1 x2

. 
finish of do-file

Line 3 of code block 3 places the string y x1 x2 into the native macro named vlist. To extract what I put into a neighborhood macro I enclose the identify of the native macro between a single left quote (‘) and a single proper quote (’). Line 4 of code block 3 shows what’s contained within the native macro vlist and its output in instance 4 illustrates this utilization.

Getting stuff from Stata instructions

Now that we’ve got bins, I’ll present you the right way to retailer stuff computed by Stata in these bins. Evaluation instructions, like summarize, retailer their leads to r(). Estimation instructions, like regress, retailer their leads to e(). Considerably tautologically, instructions that retailer their leads to r() are also referred to as r-class instructions and instructions that retailer their leads to e() are also referred to as e-class instructions.

I can use return checklist to see outcomes saved by an r-class command. Beneath, I checklist out what summarize has saved in r() and compute the imply from the saved outcomes.

Instance 5: Getting outcomes from an r-class command


. use http://www.stata.com/information/accident2.dta, clear

. summarize accidents

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   accidents |        948    .8512658    2.851856          0         20

. return checklist

scalars:
                  r(N) =  948
              r(sum_w) =  948
               r(imply) =  .8512658227848101
                r(Var) =  8.133081817331211
                 r(sd) =  2.851855854935732
                r(min) =  0
                r(max) =  20
                r(sum) =  807

. native sum = r(sum)

. native N   = r(N)

. show "The imply is " `sum'/`N'
The imply is .85126582

Estimation instructions are extra formal than evaluation instructions, so that they save extra stuff.

Official Stata estimation instructions save numerous stuff, as a result of they comply with numerous guidelines that make postestimation simple for customers. Don’t be alarmed by the variety of issues saved by poisson. Beneath, I checklist out the outcomes saved by poisson and create a Stata matrix that accommodates the coefficient estimates.

Instance 6: Getting outcomes from an e-class command


. poisson accidents visitors tickets male

Iteration 0:   log probability = -377.98594  
Iteration 1:   log probability = -370.68001  
Iteration 2:   log probability = -370.66527  
Iteration 3:   log probability = -370.66527  

Poisson regression                              Variety of obs     =        948
                                                LR chi2(3)        =    3357.64
                                                Prob > chi2       =     0.0000
Log probability = -370.66527                     Pseudo R2         =     0.8191

------------------------------------------------------------------------------
   accidents |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     visitors |   .0764399   .0129856     5.89   0.000     .0509887    .1018912
     tickets |   1.366614   .0380641    35.90   0.000      1.29201    1.441218
        male |   3.228004   .1145458    28.18   0.000     3.003499     3.45251
       _cons |  -7.434478   .2590086   -28.70   0.000    -7.942126    -6.92683
------------------------------------------------------------------------------

. ereturn checklist

scalars:
               e(rank) =  4
                  e(N) =  948
                 e(ic) =  3
                  e(okay) =  4
               e(k_eq) =  1
               e(k_dv) =  1
          e(converged) =  1
                 e(rc) =  0
                 e(ll) =  -370.6652697757637
         e(k_eq_model) =  1
               e(ll_0) =  -2049.485325326086
               e(df_m) =  3
               e(chi2) =  3357.640111100644
                  e(p) =  0
               e(r2_p) =  .8191422669899876

macros:
            e(cmdline) : "poisson accidents visitors tickets male"
                e(cmd) : "poisson"
            e(predict) : "poisso_p"
          e(estat_cmd) : "poisson_estat"
           e(chi2type) : "LR"
                e(choose) : "moptimize"
                e(vce) : "oim"
              e(title) : "Poisson regression"
               e(person) : "poiss_lf"
          e(ml_method) : "e2"
          e(approach) : "nr"
              e(which) : "max"
             e(depvar) : "accidents"
         e(properties) : "b V"

matrices:
                  e(b) :  1 x 4
                  e(V) :  4 x 4
               e(ilog) :  1 x 20
           e(gradient) :  1 x 4

capabilities:
             e(pattern)   

. matrix b = e(b)

. matrix checklist b

b[1,4]
     accidents:  accidents:  accidents:  accidents:
       visitors     tickets        male       _cons
y1   .07643992    1.366614   3.2280044   -7.434478

Completed and Undone

On this second submit within the collection #StataProgramming, I mentioned the distinction between scripts and instructions, I supplied an introduction to the ideas of worldwide and native reminiscence objects, I mentioned international macros and native macros, and I confirmed the right way to entry outcomes saved by different instructions.

Within the subsequent submit within the collection #StataProgramming, I talk about an instance that additional illustrates the variations between international macros and native macros.



Related Articles

Latest Articles