Monday, March 2, 2026

Programming an estimation command in Stata: A greater OLS command


I take advantage of the syntax command to enhance the command that implements the peculiar least-squares (OLS) estimator that I mentioned in Programming an estimation command in Stata: A primary command for OLS. I present learn how to require that each one variables be numeric variables and learn how to make the command settle for time-series operated variables.

That is the seventh submit within the collection Programming an estimation command in Stata. I like to recommend that you just begin originally. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this collection.

Stata syntax and the syntax command

The myregress2 command described in Programming an estimation command in Stata: A primary command for OLS has the syntax

myregress2 depvar [indepvars]

This syntax requires that the dependent variable be specified as a result of depvar is just not enclosed in sq. brackets. The unbiased variables are elective as a result of indepvars is enclosed in sq. brackets. Kind

. assist language

for an introduction to studying Stata syntax diagrams.

This syntax is applied by the syntax command in line 5 of myregress2.ado, which I mentioned at size in Programming an estimation command in Stata: A primary command for OLS. The consumer should specify a listing of variable names as a result of varlist is just not enclosed in sq. brackets. The syntax of the syntax command follows the principles of a syntax diagram.

Code block 1: myregress2.ado


*! model 2.0.0  26Oct2015
program outline myregress2, eclass
	model 14

	syntax varlist
	gettoken depvar : varlist

	tempname zpz xpx xpy xpxi b V
	tempvar  xbhat res res2 

	quietly matrix accum `zpz' = `varlist'
	native p : phrase depend `varlist'
	native p = `p' + 1
	matrix `xpx'                = `zpz'[2..`p', 2..`p']
	matrix `xpy'                = `zpz'[2..`p', 1]
	matrix `xpxi'               = syminv(`xpx')
	matrix `b'                  = (`xpxi'*`xpy')'
	quietly matrix rating double `xbhat' = `b'
	quietly generate double `res'       = (`depvar' - `xbhat')
	quietly generate double `res2'      = (`res')^2
	quietly summarize `res2'
	native N                     = r(N)
	native sum                   = r(sum)
	native s2                    = `sum'/(`N'-(`p'-1))
	matrix `V'                  = `s2'*`xpxi'
	ereturn submit `b' `V'
	ereturn native         cmd   "myregress2"
	ereturn show
finish

Instance 1 illustrates that myregress2 runs the requested regression once I specify a varlist.

Instance 1: myregress2 with specified variables


. sysuse auto
(1978 Car Information)

. myregress2 value mpg trunk
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -220.1649   65.59262    -3.36   0.001    -348.7241    -91.6057
       trunk |   43.55851   88.71884     0.49   0.623    -130.3272    217.4442
       _cons |   10254.95   2349.084     4.37   0.000      5650.83    14859.07
------------------------------------------------------------------------------

Instance 2 illustrates that the syntax command shows an error message and stops execution once I don’t specify a varlist. I take advantage of set hint on to see every line of code and the output it produces.

Instance 2: myregress2 with no varlist


. set hint on 

. myregress2 
  --------------------------------------------------------- start myregress2 --
  - model 14
  - syntax varlist
varlist required
  ----------------------------------------------------------- finish myregress2 --
r(100);

Instance 3 illustrates that the syntax command is checking that the required variables are within the present dataset. syntax throws an error as a result of DoesNotExist is just not a variable within the present dataset.

Instance 3: myregress2 with a variable not on this dataset


. set hint on 

. myregress2 value mpg trunk DoesNotExist
  --------------------------------------------------------- start myregress2 --
  - model 14
  - syntax varlist
variable DoesNotExist not discovered
  ----------------------------------------------------------- finish myregress2 --
r(111);

finish of do-file

r(111);

As a result of the syntax command on line 5 is just not limiting the required variables to be numeric, I get the no observations error in instance 4 as a substitute of an error indicating the precise downside, which is the string variable make.

Instance 4: myregress2 with a string variable


. describe make

              storage   show    worth
variable identify   kind    format     label      variable label
-------------------------------------------------------------------------------
make            str18   %-18s                 Make and Mannequin

. myregress2 value mpg trunk make
no observations
r(2000);

finish of do-file

r(2000);

On line 5 of myregress3, I modify varlist to solely settle for numeric variables This alteration produces a extra informative error message when I attempt to embody a string variable within the regression.

Code block 2: myregress3.ado


*! model 3.0.0  30Oct2015
program outline myregress3, eclass
	model 14

	syntax varlist(numeric)
	gettoken depvar : varlist

	tempname zpz xpx xpy xpxi b V
	tempvar  xbhat res res2 

	quietly matrix accum `zpz' = `varlist'
	native p : phrase depend `varlist'
	native p = `p' + 1
	matrix `xpx'                = `zpz'[2..`p', 2..`p']
	matrix `xpy'                = `zpz'[2..`p', 1]
	matrix `xpxi'               = syminv(`xpx')
	matrix `b'                  = (`xpxi'*`xpy')'
	quietly matrix rating double `xbhat' = `b'
	quietly generate double `res'       = (`depvar' - `xbhat')
	quietly generate double `res2'      = (`res')^2
	quietly summarize `res2'
	native N                     = r(N)
	native sum                   = r(sum)
	native s2                    = `sum'/(`N'-(`p'-1))
	matrix `V'                  = `s2'*`xpxi'
	ereturn submit `b' `V'
	ereturn native         cmd   "myregress3"
	ereturn show
finish

Instance 5: myregress3 with a string variable


. set hint on 

. myregress3 value mpg trunk make
  --------------------------------------------------------- start myregress3 --
  - model 14
  - syntax varlist(numeric)
string variables not allowed in varlist;
make is a string variable
  ----------------------------------------------------------- finish myregress3 --
r(109);

finish of do-file

r(109);

On line 5 of myregress4, I modify the varlist to just accept time-series (ts) variables. The syntax command places time-series variables in a canonical type that’s saved within the native macro varlist, as illustrated within the show on line 6, whose output seems in instance 6.

Code block 3: myregress4.ado


*! model 4.0.0  31Oct2015
program outline myregress4, eclass
	model 14

	syntax varlist(numeric ts)
	show "varlist is `varlist'"
	gettoken depvar : varlist

	tempname zpz xpx xpy xpxi b V
	tempvar  xbhat res res2 

	quietly matrix accum `zpz' = `varlist'
	native p : phrase depend `varlist'
	native p = `p' + 1
	matrix `xpx'                = `zpz'[2..`p', 2..`p']
	matrix `xpy'                = `zpz'[2..`p', 1]
	matrix `xpxi'               = syminv(`xpx')
	matrix `b'                  = (`xpxi'*`xpy')'
	quietly matrix rating double `xbhat' = `b'
	quietly generate double `res'       = (`depvar' - `xbhat')
	quietly generate double `res2'      = (`res')^2
	quietly summarize `res2'
	native N                     = r(N)
	native sum                   = r(sum)
	native s2                    = `sum'/(`N'-(`p'-1))
	matrix `V'                  = `s2'*`xpxi'
	ereturn submit `b' `V'
	ereturn native         cmd   "myregress4"
	ereturn show
finish

Instance 6: myregress4 with time-series variables


. sysuse gnp96

. myregress4  L(0/3).gnp 
varlist is gnp96 L.gnp96 L2.gnp96 L3.gnp96
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       gnp96 |
         L1. |   1.277086   .0860652    14.84   0.000     1.108402    1.445771
         L2. |   -.135549   .1407719    -0.96   0.336    -.4114568    .1403588
         L3. |  -.1368326   .0871645    -1.57   0.116    -.3076719    .0340067
             |
       _cons |   -2.94825   14.36785    -0.21   0.837    -31.10871    25.21221
------------------------------------------------------------------------------

Completed and undone

I used the syntax command to enhance how myregress2 handles the variables specified by the consumer. I confirmed learn how to require that each one variables be numeric variables and learn how to make the command settle for time-series operated variables. Within the subsequent submit, I present learn how to make the command permit for pattern restrictions, learn how to deal with lacking values, learn how to permit for factor-operated variables, and learn how to take care of completely collinear variables.



Related Articles

Latest Articles