Thursday, March 5, 2026

Programming an estimation command in Stata: A primary ado-command


I focus on the code for a easy estimation command to concentrate on the main points of the right way to implement an estimation command. The command that I focus on estimates the imply by the pattern common. I start by reviewing the formulation and a do-file that implements them. I subsequently introduce ado-file programming and focus on two variations of the command. Alongside the way in which, I illustrate among the postestimation options that work after the command.

That is the fourth submit within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin initially. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.

The formulation for our estimator

The formulation for the pattern common and its estimated sampling variance, assuming an independently and identically distributed course of, are

[
widehat{mu} = 1/N sum_{i=1}^N y_i
]

[
widehat{Var}(widehat{mu}) = 1/[N(N-1)] sum_{i=1}^N (y_i-widehat{mu})^2
]

The code mean1.do performs these computations on worth from the auto dataset.

Code block 1: mean1.do


// model 1.0.0 20Oct2015
model 14
sysuse auto
quietly summarize worth
native sum          = r(sum)
native N            = r(N)
native mu           = (1/`N')*`sum'
generate double e2 = (worth - `mu')^2
quietly summarize e2
native V            = (1/((`N')*(`N'-1)))*r(sum)
show "muhat = " `mu'
show "   V  = " `V'

mean1.do makes use of summarize to compute the summations. Traces 5–7 and line 11 retailer outcomes saved by summarize in r() into native macros which are subsequently used to compute the formulation. I like to recommend that you simply use double, as an alternative of the default float, to compute all variables utilized in your formulation as a result of it’s nearly at all times value taking over the additional reminiscence to realize the additional precision provided by double over float. (Primarily, every variable takes up twice as a lot area, however you get calculations which are right to about (10^{-16}) as an alternative of (10^{-8}).)

These calculations yield

Instance 1: Computing the common and its sampling variance


. do mean1

. // model 1.0.0 20Oct2015
. model 14

. sysuse auto
(1978 Car Knowledge)

. quietly summarize worth

. native sum          = r(sum)

. native N            = r(N)

. native mu           = (1/`N')*`sum'

. generate double e2 = (worth - `mu')^2

. quietly summarize e2

. native V            = (1/((`N')*(`N'-1)))*r(sum)

. show "muhat = " `mu'
muhat = 6165.2568

. show "   V  = " `V'
   V  = 117561.16

. 
finish of do-file

Now I confirm that imply produces the identical outcomes.

Instance 2: Outcomes from imply


. imply worth

Imply estimation                   Variety of obs   =         74

--------------------------------------------------------------
             |       Imply   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       worth |   6165.257   342.8719      5481.914      6848.6
--------------------------------------------------------------

. matrix listing e(b)

symmetric e(b)[1,1]
        worth
y1  6165.2568

. matrix listing e(V)

symmetric e(V)[1,1]
           worth
worth  117561.16

A primary ado-file

The code in mymean1.ado performs the identical calculations as mean1.do. (The file mymean1.ado is in my present working listing.)

Code block 2: mymean1.ado


*! model 1.0.0 20Oct2015
program outline mymean1
	model 14

	quietly summarize worth
	native sum          = r(sum)
	native N            = r(N)
	native mu           = (1/`N')*`sum'
	seize drop e2				// Drop e2 if it exists
	generate double e2 = (worth - `mu')^2
	quietly summarize e2
	native V            = (1/((`N')*(`N'-1)))*r(sum)
	show "muhat = " `mu'
	show "   V  = " `V'
finish

Line 1 of mymean1.ado specifies that file defines the command mymean1. The command title should be the identical because the file title that precedes the suffix .ado. The mymean1 command performs the identical computations because the do-file mean1.do.

Instance 3: Outcomes from mymean1


. mymean1
muhat = 6165.2568
   V  = 117561.16

A barely higher command

We wish our command to be reusable; we wish it to estimate the imply for any variable in reminiscence, as an alternative of just for worth as carried out by mymean1.ado. On line 5 of mymean2.ado, we use the syntax command to retailer the title of the variable specified by the person into the native macro varlist, which we use within the the rest of the computations.

Code block 3: mymean2.ado


*! model 2.0.0 20Oct2015
program outline mymean2
	model 14

	syntax varlist
	show "The native macro varlist incorporates `varlist'"

	quietly summarize `varlist'
	native sum          = r(sum)
	native N            = r(N)
	native mu           = (1/`N')*`sum'
	seize drop e2				// Drop e2 if it exists
	generate double e2 = (`varlist' - `mu')^2
	quietly summarize e2
	native V            = (1/((`N')*(`N'-1)))*r(sum)
	show  "The typical of `varlist' is " `mu'
	show  "The estimated variance of the common is " `V'
finish

The extraordinarily highly effective syntax command places the weather of Stata syntax specified by the person into native macros and throws errors when the person makes a mistake. I’ll focus on syntax in larger element in subsequent posts.

I start by illustrating the right way to replicate the earlier outcomes.

Instance 4: Outcomes from mymean2 worth


. mymean2 worth
The native macro varlist incorporates worth
The typical of worth is 6165.2568
The estimated variance of the common is 117561.16

I now illustrate that it really works for one more variable.

Instance 5: Outcomes from mymean2 trunk


. mymean2 trunk
The native macro varlist incorporates trunk
The typical of trunk is 13.756757
The estimated variance of the common is .24724576

. imply trunk

Imply estimation                   Variety of obs   =         74

--------------------------------------------------------------
             |       Imply   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       trunk |   13.75676   .4972381      12.76576    14.74775
--------------------------------------------------------------

. show "The variance of the estimator is " (_se[trunk])^2
The variance of the estimator is .24724576

Storing leads to e()

mymean2.ado doesn’t save the outcomes that it shows. We repair this downside in mymean3.ado. Line 2 specifies the choice e-class on program outline to make mymean3 an e-class command. Line 18 makes use of ereturn submit to maneuver the matrix of level estimates (b) and the estimated variance-covariance of the estimator (VCE) into e(b) and e(V). The estimation-postestimation framework makes use of parameter names for show, speculation checks, and different options. In strains 15 and 16, we put these names into the column stripes of the vector of estimates and the estimated VCE. In line 17, we put these names into the row stripe of the estimated VCE.

Code block 4: mymean3.ado


*! model 3.0.0 20Oct2015
program outline mymean3, eclass
	model 14

	syntax varlist

	quietly summarize `varlist'
	native sum          = r(sum)
	native N            = r(N)
	matrix b           = (1/`N')*`sum'
	seize drop e2				// Drop e2 if it exists
	generate double e2 = (`varlist' - b[1,1])^2
	quietly summarize e2
	matrix V           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames b  = `varlist'
	matrix colnames V  = `varlist'
	matrix rownames V  = `varlist'
	ereturn submit b V
	ereturn show
finish

The ereturn show command on line 19 of mymean3.ado simply creates a normal output desk utilizing the outcomes now saved in e(b) and e(V).

Instance 6: Outcomes from mymean3 trunk


. mymean3 trunk
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   13.75676   .4972381    27.67   0.000     12.78219    14.73133
------------------------------------------------------------------------------

Estimation-postestimation

take a look at, lincom, testnl, nlcom, and different Wald-based estimation-postestimation options work after mymean3 as a result of all of the required info is saved in e(b) and e(V).

As an instance, I carry out a Wald of the null speculation that the imply of trunk is (11).

Instance 7: take a look at works after mymean3


. take a look at _b[trunk]==11

 ( 1)  trunk = 11

           chi2(  1) =   30.74
         Prob > chi2 =    0.0000

The outcomes saved in e() are the glue that holds the estimation-postestimation framework collectively. We have now solely saved e(b) and e(V) to this point, so not all the usual options are working but. (However we are going to get there within the #StataProgramming sequence.)

Utilizing non permanent names for international objects

Stata variables and matrices are international, as mentioned in my earlier weblog submit. We want some protected names for international objects. These protected names shouldn’t be in use elsewhere, and they need to be non permanent in that we wish Stata to drop the corresponding objects when the command finishes. The tempvar and tempname instructions put protected names into native macros after which drop the corresponding objects when the ado-file or do-file finishes. We explicitly dropped e2, if it existed, in line 9 of code block 2, in line 12 of code block 3, and in line 11 of code block 4. We don’t want such a line in code block, as a result of we’re utilizing non permanent variable names.

In line 7 of mymean4.ado, the tempvar command places a protected title into the native macro e2. In line 8 of mymean4.ado, the tempname command places protected names into the native macros b and V. I illustrate the format adopted by these protected names by displaying them on strains 9–11. The output reveals {that a} main pair of underscores is adopted by numbers and capital letters. Line 15 illustrates the usage of these protected names. As an alternative of making the matrix b, we create the matrix whose title is saved within the native macro b. In line 8, the tempname command created the native macro b to carry a protected title.

Code block 5: mymean4.ado


*! model 4.0.0 20Oct2015
program outline mymean4, eclass
	model 14

	syntax varlist

	tempvar e2 
	tempname b V
	show "The protected title in e2 is `e2'"
	show "The protected title in b is `b'"
	show "The protected title in V is `V'"
	quietly summarize `varlist'
	native sum            = r(sum)
	native N              = r(N)
	matrix `b'           = (1/`N')*`sum'
	generate double `e2' = (`varlist' - `b'[1,1])^2
	quietly summarize `e2'
	matrix `V'           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn submit `b' `V'
	ereturn show
finish

This code produces the output

Instance 8: Outcomes from mymean4 trunk


. mymean4 trunk
The protected title in e2 is __000000
The protected title in b is __000001
The protected title in V is __000002
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   13.75676   .4972381    27.67   0.000     12.78219    14.73133
------------------------------------------------------------------------------

Eradicating the strains that show the protected names contained within the native macros yields mymean5.ado.

Code block 6: mymean5.ado


*! model 5.0.0 20Oct2015
program outline mymean5, eclass
	model 14

	syntax varlist

	tempvar e2 
	tempname b V
	quietly summarize `varlist'
	native sum            = r(sum)
	native N              = r(N)
	matrix `b'           = (1/`N')*`sum'
	generate double `e2' = (`varlist' - `b'[1,1])^2
	quietly summarize `e2'
	matrix `V'           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn submit `b' `V'
	ereturn show
finish

This code produces the output

Instance 9: Outcomes from mymean5 trunk


. mymean5 trunk
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       trunk |   13.75676   .4972381    27.67   0.000     12.78219    14.73133
------------------------------------------------------------------------------

Performed and undone

I illustrated some fundamental ado-file programming strategies by implementing a command that estimates the imply of variable. Although now we have a command that produces right, easy-to-read output that has some estimation-postestimation options, now we have solely scratched the floor of what we normally wish to do in an estimation command. I dig somewhat deeper within the subsequent few posts by growing a command that performs strange least-squares estimation.



Related Articles

Latest Articles