Sunday, March 1, 2026

Programming an estimation command in Stata: Permitting for pattern restrictions and issue variables


I modify the abnormal least-squares (OLS) command mentioned in Programming an estimation command in Stata: A greater OLS command to permit for pattern restrictions, to deal with lacking values, to permit for issue variables, and to cope with completely collinear variables.

That is the eighth publish within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin at first. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.

Pattern restrictions

The myregress4 command described in Programming an estimation command in Stata: A greater OLS command has the syntax

myregress4 depvar [indepvars]

the place the indepvars could also be time-series operated variables. myregress5 permits for pattern restrictions and lacking values. It has the syntax

myregress5 depvar [indepvars] [if] [in]

A consumer might optionally specify an if expression or an in vary to limit the pattern. I additionally make myregress5 deal with lacking values within the user-specified variables.

Code block 1: myregress5.ado


*! model 5.0.0  22Nov2015
program outline myregress5, eclass
	model 14

	syntax varlist(numeric ts) [if] [in]
	marksample touse

	gettoken depvar : varlist

	tempname zpz xpx xpy xpxi b V
	tempvar  xbhat res res2 

	quietly matrix accum `zpz' = `varlist' if `touse'
	native p : phrase depend `varlist'
	native p = `p' + 1
	matrix `xpx'                = `zpz'[2..`p', 2..`p']
	matrix `xpy'                = `zpz'[2..`p', 1]
	matrix `xpxi'               = syminv(`xpx')
	matrix `b'                  = (`xpxi'*`xpy')'
	quietly matrix rating double `xbhat' = `b' if `touse'
	quietly generate double `res'       = (`depvar' - `xbhat') if `touse'
	quietly generate double `res2'      = (`res')^2 if `touse'
	quietly summarize `res2' if `touse' , meanonly
	native N                     = r(N)
	native sum                   = r(sum)
	native s2                    = `sum'/(`N'-(`p'-1))
	matrix `V'                  = `s2'*`xpxi'
	ereturn publish `b' `V', esample(`touse')
	ereturn scalar           N  = `N'
	ereturn native         cmd   "myregress5"
	ereturn show
finish

The syntax command in line 5 specifies {that a} consumer might optionally prohibit the pattern by specifying an if expression or in vary. When the consumer specifies an if expression, syntax places it into the native macro if; in any other case, the native macro if is empty. When the consumer specifies an in vary, syntax places it into the native macro in; in any other case, the native macro in is empty.

We might use the native macros if and in to deal with user-specified pattern restrictions, however these don’t account for lacking values within the user-specified variables. The marksample command in line 6 creates a neighborhood macro named touse, which incorporates the identify of a short lived variable that may be a sample-identification variable. Every remark within the sample-identification variable is both one or zero. It’s one if the remark is included within the pattern. It’s zero if the remark is excluded from the pattern. An remark will be excluded by a user-specified if expression, by a user-specified in vary, or as a result of there’s a lacking worth in one of many user-specified variables.

Traces 20–23 use the sample-identification variable contained within the native macro touse to implement these pattern restrictions on the OLS calculations.

Line 28 posts the sample-identification variable into e(pattern), which is one if the remark was included within the estimation pattern and it’s zero if the remark was excluded from the estimation pattern.

Line 29 shops the variety of observations within the pattern in e(N).

Instance 1 illustrates that myregress5 runs the requested regression on the pattern that respects the lacking values in rep78 and accounts for an if expression.

Instance 1: myregress5 with lacking values and an if expression


. sysuse auto
(1978 Vehicle Information)

. depend if !lacking(rep78)
  69

. depend if !lacking(rep78) & mpg < 30
  62

. myregress5 worth mpg trunk rep78 if mpg < 30
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -376.8591   107.4289    -3.51   0.000    -587.4159   -166.3024
       trunk |  -36.39376   102.2139    -0.36   0.722    -236.7294    163.9418
       rep78 |   556.3029   378.1101     1.47   0.141    -184.7793    1297.385
       _cons |    12569.5   3455.556     3.64   0.000     5796.735    19342.27
------------------------------------------------------------------------------

. ereturn listing

scalars:
                  e(N) =  62

macros:
                e(cmd) : "myregress5"
         e(properties) : "b V"

matrices:
                  e(b) :  1 x 4
                  e(V) :  4 x 4

capabilities:
             e(pattern)   

Permitting for issue variables

Instance 1 consists of the variety of repairs as a steady variable, nevertheless it may be higher handled as a discrete issue. myregress6 accepts issue variables. Issue-variable lists often suggest variable lists that comprise completely collinear variables, so myregress6 additionally handles
completely collinear variables.

Code block 2: myregress6.ado


*! model 6.0.0  22Nov2015
program outline myregress6, eclass
	model 14

	syntax varlist(numeric ts fv) [if] [in]
	marksample touse

	gettoken depvar : varlist
	_fv_check_depvar `depvar'

	tempname zpz xpx xpy xpxi b V
	tempvar  xbhat res res2 

	quietly matrix accum `zpz' = `varlist' if `touse'
	native p                    = colsof(`zpz')
	matrix `xpx'               = `zpz'[2..`p', 2..`p']
	matrix `xpy'               = `zpz'[2..`p', 1]
	matrix `xpxi'              = syminv(`xpx')
	native okay                    = `p' - diag0cnt(`xpxi') - 1
	matrix `b'                 = (`xpxi'*`xpy')'
	quietly matrix rating double `xbhat' = `b' if `touse'
	quietly generate double `res'       = (`depvar' - `xbhat') if `touse'
	quietly generate double `res2'      = (`res')^2 if `touse'
	quietly summarize `res2' if `touse' , meanonly
	native N                     = r(N)
	native sum                   = r(sum)
	native s2                    = `sum'/(`N'-(`okay'))
	matrix `V'                  = `s2'*`xpxi'
	ereturn publish `b' `V', esample(`touse') buildfvinfo
	ereturn scalar N            = `N'
	ereturn scalar rank         = `okay'
	ereturn native  cmd             "myregress6"
	ereturn show
finish

The fv within the parentheses after varlist within the syntax command in line 5 modifies the varlist to simply accept issue variables. Any specified issue variables are saved within the native macro varlist in a canonical type.

Estimation instructions don’t enable the dependent variable to be an element variable. The _fv_check_depvar command in line 9 will exit with an error if the native macro depvar incorporates an element variable.

Line 15 shops the variety of columns within the matrix shaped by matrix accum within the native macro p. Line 19 shops the variety of linearly unbiased columns within the native macro okay. This calculation makes use of diag0cnt() to account for the superbly collinear variables that have been dropped. (Every dropped variable places a zero on the diagonal of the generalized inverse calculated by syminv() and diag0cnt() returns the variety of zeros on the diagonal.)

On line 29, I specified choice buildfvinfo on ereturn publish to retailer hidden info that ereturn show, distinction, margins, and pwcompare use to label tables and to determine which capabilities of the parameters are estimable.

Line 31 shops the variety of linearly unbiased variables in e(rank) for postestimation instructions.

Now, I take advantage of myregress6 to incorporate rep78 as a factor-operated variable. The bottom class is dropped as a result of we included a continuing time period.

Instance 2: myregress6 with a factor-operated variable


. myregress6 worth mpg trunk i.rep78 
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         mpg |  -262.7053   73.49434    -3.57   0.000    -406.7516   -118.6591
       trunk |   41.75706    93.9671     0.44   0.657    -142.4151    225.9292
             |
       rep78 |
          2  |   654.7905   2136.246     0.31   0.759    -3532.175    4841.756
          3  |   1170.606   2001.739     0.58   0.559     -2752.73    5093.941
          4  |   1473.352   2017.138     0.73   0.465    -2480.167     5426.87
          5  |   2896.888   2121.206     1.37   0.172    -1260.599    7054.375
             |
       _cons |   9726.377   2790.009     3.49   0.000      4258.06    15194.69
------------------------------------------------------------------------------

Executed and undone

I modified the OLS command mentioned in Programming an estimation command in Stata: A greater OLS command to permit for pattern restrictions, to deal with lacking values, to permit for issue variables, and to cope with completely collinear variables. Within the subsequent publish, I present tips on how to enable choices for sturdy commonplace errors and to suppress the fixed time period.



Related Articles

Latest Articles