Saturday, February 21, 2026

Programming an estimation command in Stata: A primary ado-command utilizing Mata


I talk about a sequence of ado-commands that use Mata to estimate the imply of a variable. The instructions illustrate a normal construction for Stata/Mata packages. This submit builds on Programming an estimation command in Stata: Mata 101, Programming an estimation command in Stata: Mata features, and Programming an estimation command in Stata: A primary ado-command.

That is the thirteenth submit within the collection Programming an estimation command in Stata. I like to recommend that you just begin at the start. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this collection.

Utilizing Mata in ado-programs

I start by reviewing the construction in mymean5.ado, which I mentioned in Programming an estimation command in Stata: A primary ado-command.

Code block 1: mymean5.ado


*! model 5.0.0 20Oct2015
program outline mymean5, eclass
	model 14

	syntax varlist(max=1)

	tempvar e2 
	tempname b V
	quietly summarize `varlist'
	native sum            = r(sum)
	native N              = r(N)
	matrix `b'           = (1/`N')*`sum'
	generate double `e2' = (`varlist' - `b'[1,1])^2
	quietly summarize `e2'
	matrix `V'           = (1/((`N')*(`N'-1)))*r(sum)
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn submit `b' `V'
	ereturn show
finish

The syntax command on line 5 shops the identify of the variable for which the command estimates the imply. The tempvar and tempname instructions on strains 7 and eight put non permanent names into the native macros e2, b, and V. Strains 9-15 compute the purpose estimates and the estimated variance-covariance of the estimator (VCE), utilizing the non permanent names for objects, in order to not overwrite user-created objects. Strains 16–18 put the column identify on the purpose estimate and row and column names on the estimated VCE. Line 19 posts the purpose estimate and the estimated VCE to e(b) and e(V), respectively. Line 20 produces a typical output desk from the knowledge saved in e(b) and e(V).

By the top of this submit, I’ll have a command that replaces the Stata computations on strains 9–15 with Mata computations. As an example the construction of Stata-Mata programming, I begin off solely computing the purpose estimate in myregress6.

Code block 2: mymean6.ado


*! model 6.0.0 05Dec2015
program outline mymean6, eclass
	model 14

	syntax varlist(max=1)

	mata: x  = st_data(., "`varlist'")
	mata: w  = imply(x)
	mata: st_matrix("Q", w)

	show "The purpose estimates are in Q"
	matrix listing Q

finish

Line 7 executes a one-line name to Mata; on this building, Stata drops right down to Mata, executes the Mata expression, and pops again as much as Stata. Popping right down to Mata and again as much as Stata takes nearly no time, however I want to keep away from doing it thrice. (Strains 8 and 9 are additionally one-line calls to Mata.)

Line 7 places a duplicate of all of the observations on the variable for which the command estimates the imply within the Mata column vector named x, which is saved in international Mata reminiscence. Line 8 shops the imply of the column vector within the 1(occasions)1 matrix named w, which can be saved in international Mata reminiscence. Line 9 copies the Mata matrix w to the Stata matrix named Q. Strains 11 and 12 show the outcomes saved in Stata.

I illustrate what myregress6 produces in instance 1.

Instance 1: myregress6 makes use of international Mata reminiscence


. sysuse auto
(1978 Vehicle Knowledge)

. mymean6 worth
The purpose estimates are in Q

symmetric Q[1,1]
           c1
r1  6165.2568

. matrix dir
            Q[1,1]

. mata: mata describe

      # bytes   kind                        identify and extent
-------------------------------------------------------------------------------
            8   actual scalar                 w
          592   actual colvector              x[74]
-------------------------------------------------------------------------------

I exploit matrix dir for instance that Q is a Stata matrix, and I exploit mata describe for instance that x and w are objects in international Mata reminiscence. Utilizing fastened names for an object in Stata reminiscence or in international Mata reminiscence needs to be prevented, as a result of you may overwrite customers’ information.

mymean7 doesn’t put something in international Mata reminiscence; all computations are completed utilizing objects which can be native to the Mata perform mymean_work(). mymean7 makes use of non permanent names for objects saved in Stata reminiscence.

Code block 3: mymean7.ado


*! model 7.0.0 07Dec2015
program outline mymean7, eclass
	model 14

	syntax varlist(max=1)

	tempname b
	mata: mymean_work("`varlist'", "`b'")

	show "b is "
	matrix listing `b'
finish

mata:
void mymean_work(string scalar vname, string scalar mname)
{
	actual vector    x
	actual scalar    w
	
	x  = st_data(., vname)
	w  = imply(x)
	st_matrix(mname, w)
}
finish

There are two elements to mymean7.ado: an ado-program and a Mata perform. The ado-program is outlined on strains 2–12. The Mata perform mymean_work() is outlined on strains 14–24. The Mata perform mymean_work() is native to the ado-program mymean7.

Line 8 makes use of a one-line name to Mata to execute mymean_work(). mymean_work() doesn’t return something to international Mata reminiscence, and we’re passing in two arguments. The primary argument is a string scalar containing the identify of the variable for which the perform ought to compute the estimate. The second argument is a string scalar containing the non permanent identify saved within the native macro b. This non permanent identify would be the identify of a Stata matrix that shops the purpose estimate computed in mymean_work().

Line 15 declares the perform mymean_work(). Perform declarations specify what the perform returns, the identify of the perform, and the arguments that the perform accepts; see Programming an estimation command in Stata: Mata features for a fast introduction.

The phrase void on line 15 specifies that the perform doesn’t return an argument; in different phrases, it returns nothing. What precedes the ( is the perform identify; thus, mymean_work() is the identify of the perform. The phrases string scalar vname specify that the primary argument of mymean_work() is a string scalar that is called vname inside mymean_work(). The comma separates the primary argument from the second argument. The phrases string scalar mname specify that the second argument of mymean_work() is a string scalar that is called mname inside mymean_work(). ) closes the perform declaration.

Strains 17-22 outline mymean_work() as a result of they’re enclosed between the curly braces on strains 16 and 23. Strains 17 and 18 declare the actual vector x and actual scalar w, that are native to mymean_work(). Strains 20 and 21 compute the estimate. Line 22 copies the estimate saved within the scalar w, which is native to the Mata perform mymean_work(), to the Stata matrix whose identify is saved within the string scalar mname, which incorporates the non permanent identify contained within the native macro b that was handed to the perform on line 8.

The construction utilized in mymean7 ensures three necessary options.

  1. It doesn’t use international Mata reminiscence.
  2. It makes use of non permanent names for international Stata objects.
  3. It leaves nothing behind in Mata reminiscence or Stata reminiscence.

Examples 2 and three mix for instance function (3); instance 2 clears Stata and Mata reminiscence, and instance 3 reveals that mymean7 leaves nothing within the beforehand cleared reminiscence.

Instance 2: Eradicating objects from Stata and Mata reminiscence


. clear all

. matrix dir

. mata: mata describe

      # bytes   kind                        identify and extent
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

Intimately, I exploit clear all to drop all objects from Stata and Mata reminiscence, use matrix dir for instance that no matrices have been left in Stata reminiscence, and use mata describe for instance that no objects have been left in Mata reminiscence.

Instance 3: mymean7 leaves nothing in Stata or Mata reminiscence


. sysuse auto
(1978 Vehicle Knowledge)

. mymean7 worth
b is 

symmetric __000000[1,1]
           c1
r1  6165.2568

. matrix dir

. mata: mata describe

      # bytes   kind                        identify and extent
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------

In instance 3, I exploit mymean7 to estimate the imply, and use matrix dir and mata describe for instance that mymean7 didn’t depart Stata matrices or Mata objects in reminiscence. The output additionally illustrates that the non permanent identify __000000 was used for the Stata matrix that held the outcome earlier than the ado-program terminated.

Whereas it’s good that mymean7 leaves nothing in international Stata or Mata reminiscence, it’s dangerous that mymean7 doesn’t depart the estimate behind someplace, like in e().

mymean8 shops the ends in e() and has the options of mymean5, however computes its ends in Mata.

Code block 4: mymean8.ado


*! model 8.0.0 07Dec2015
program outline mymean8, eclass
	model 14

	syntax varlist(max=1)

	tempname b V
	mata: mymean_work("`varlist'", "`b'", "`V'")
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn submit `b' `V'
	ereturn show
finish

mata:
void mymean_work(                  ///
          string scalar vname,     ///
	  string scalar mname,     ///
	  string scalar vcename)
{
	actual vector    x, e2
	actual scalar    w, n, v
	
	x  = st_data(., vname)
	n  = rows(x)
	w  = imply(x)
	e2 = (x :- w):^2
	v   = (1/(n*(n-1)))*sum(e2)
	st_matrix(mname,   w)
	st_matrix(vcename, v)
}
finish

Line 8 is a one-line name to mymean_work(), which now has three arguments: the identify of the variable whose imply is to be estimated, a short lived identify for the Stata matrix that can maintain the purpose estimate, and a short lived identify for the Stata matrix that can maintain the estimated VCE. The declaration mymean_work() on strains 17-20 has been adjusted accordingly; every of the three arguments is a string scalar. Strains 22 and 23 declare objects native to mymean_work(). Strains 25-29 compute the imply and the estimated VCE. Strains 30 and 31 copy these outcomes to Stata matrices, beneath the non permanent names within the second and third arguments.

There’s a logic to the order of the arguments in mymean_work(); the primary argument is the identify of an enter, the second and third arguments are non permanent names for the outputs.

Returning to the ado-code, we see that strains 9–11 put row or column names on the purpose estimate or the estimated VCE. Line 12 posts the outcomes to e(), that are displayed by line 13.

Instance 4 illustrates that mymean8 produces the identical level estimate and customary error as produced by imply.

Instance 4: Evaluating mymean8 and imply


. mymean8 worth
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       worth |   6165.257   342.8719    17.98   0.000      5493.24    6837.273
------------------------------------------------------------------------------

. imply worth

Imply estimation                   Variety of obs   =         74

--------------------------------------------------------------
             |       Imply   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
       worth |   6165.257   342.8719      5481.914      6848.6
--------------------------------------------------------------

The arrogance intervals produced by mymean8 differ from these produced by imply as a result of mymean8 makes use of a standard distribution whereas imply makes use of a (t) distribution. The mymean_work() in mymean9 makes use of a fourth argument to take away this distinction.

Code block 5: mymean9.ado


*! model 9.0.0 07Dec2015
program outline mymean9, eclass
	model 14

	syntax varlist(max=1)

	tempname b V dfr
	mata: mymean_work("`varlist'", "`b'", "`V'", "`dfr'")
	matrix colnames `b'  = `varlist'
	matrix colnames `V'  = `varlist'
	matrix rownames `V'  = `varlist'
	ereturn submit `b' `V'
	ereturn scalar df_r  = `dfr'
	ereturn show
finish

mata:
void mymean_work(                  ///
          string scalar vname,     ///
	  string scalar mname,     ///
	  string scalar vcename,   ///
	  string scalar dfrname)
{
	actual vector    x, e2
	actual scalar    w, n, v
	
	x  = st_data(., vname)
	n  = rows(x)
	w  = imply(x)
	e2 = (x :- w):^2
	v   = (1/(n*(n-1)))*sum(e2)
	st_matrix(mname,   w)
	st_matrix(vcename, v)
	st_numscalar(dfrname, n-1)
}
finish

On line 8, mymean_work() accepts 4 arguments. The fourth argument is new; it incorporates the non permanent identify that’s used for the Stata scalar that holds the residual levels of freedom. Line 34 copies the worth of the expression n-1 to the Stata numeric scalar whose identify is saved within the string scalar dfrname; this Stata scalar now incorporates the residual levels of freedom. Line 13 shops the residual levels of freedom in e(df_r), which causes ereturn show to make use of a (t) distribution as a substitute of a standard distribution.

Instance 5: mymean9 makes use of a t distribution


. mymean9 worth
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       worth |   6165.257   342.8719    17.98   0.000     5481.914      6848.6
------------------------------------------------------------------------------

mymean9 has 5 primary elements.

  1. It parses the consumer enter.
  2. It makes use of a one-line name to a Mata work routine to compute outcomes and to retailer these ends in Stata matrices whose non permanent names are handed to the Mata work routine.
  3. It places on the column and row names and shops the ends in e().
  4. It shows the outcomes.
  5. It defines the Mata work routine after the finish that terminates the definition of the ado-program.

This construction can accommodate any estimator whose outcomes we will retailer in e(). The main points of every half develop into more and more difficult, however the construction stays the identical. In future posts, I talk about Stata/Mata packages with this construction that implement the abnormal least-squares (OLS) estimator and the Poisson quasi-maximum-likelihood estimator.

Carried out and undone

I mentioned a sequence of ado-commands that use Mata to estimate the imply of a variable. The instructions illustrated a normal construction for Stata/Mata packages.

Within the subsequent submit, I present some Mata computations that produce the purpose estimates, an IID VCE, a strong VCE, and a cluster-robust VCE for the OLS estimator.



Related Articles

Latest Articles