I modify the abnormal least-squares (OLS) command mentioned in Programming an estimation command in Stata: A greater OLS command to permit for pattern restrictions, to deal with lacking values, to permit for issue variables, and to cope with completely collinear variables.
That is the eighth publish within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin at first. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.
Pattern restrictions
The myregress4 command described in Programming an estimation command in Stata: A greater OLS command has the syntax
myregress4 depvar [indepvars]
the place the indepvars could also be time-series operated variables. myregress5 permits for pattern restrictions and lacking values. It has the syntax
myregress5 depvar [indepvars] [if] [in]
A consumer might optionally specify an if expression or an in vary to limit the pattern. I additionally make myregress5 deal with lacking values within the user-specified variables.
*! model 5.0.0 22Nov2015 program outline myregress5, eclass model 14 syntax varlist(numeric ts) [if] [in] marksample touse gettoken depvar : varlist tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' native p : phrase depend `varlist' native p = `p' + 1 matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') matrix `b' = (`xpxi'*`xpy')' quietly matrix rating double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' quietly summarize `res2' if `touse' , meanonly native N = r(N) native sum = r(sum) native s2 = `sum'/(`N'-(`p'-1)) matrix `V' = `s2'*`xpxi' ereturn publish `b' `V', esample(`touse') ereturn scalar N = `N' ereturn native cmd "myregress5" ereturn show finish
The syntax command in line 5 specifies {that a} consumer might optionally prohibit the pattern by specifying an if expression or in vary. When the consumer specifies an if expression, syntax places it into the native macro if; in any other case, the native macro if is empty. When the consumer specifies an in vary, syntax places it into the native macro in; in any other case, the native macro in is empty.
We might use the native macros if and in to deal with user-specified pattern restrictions, however these don’t account for lacking values within the user-specified variables. The marksample command in line 6 creates a neighborhood macro named touse, which incorporates the identify of a short lived variable that may be a sample-identification variable. Every remark within the sample-identification variable is both one or zero. It’s one if the remark is included within the pattern. It’s zero if the remark is excluded from the pattern. An remark will be excluded by a user-specified if expression, by a user-specified in vary, or as a result of there’s a lacking worth in one of many user-specified variables.
Traces 20–23 use the sample-identification variable contained within the native macro touse to implement these pattern restrictions on the OLS calculations.
Line 28 posts the sample-identification variable into e(pattern), which is one if the remark was included within the estimation pattern and it’s zero if the remark was excluded from the estimation pattern.
Line 29 shops the variety of observations within the pattern in e(N).
Instance 1 illustrates that myregress5 runs the requested regression on the pattern that respects the lacking values in rep78 and accounts for an if expression.
Instance 1: myregress5 with lacking values and an if expression
. sysuse auto
(1978 Vehicle Information)
. depend if !lacking(rep78)
69
. depend if !lacking(rep78) & mpg < 30
62
. myregress5 worth mpg trunk rep78 if mpg < 30
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -376.8591 107.4289 -3.51 0.000 -587.4159 -166.3024
trunk | -36.39376 102.2139 -0.36 0.722 -236.7294 163.9418
rep78 | 556.3029 378.1101 1.47 0.141 -184.7793 1297.385
_cons | 12569.5 3455.556 3.64 0.000 5796.735 19342.27
------------------------------------------------------------------------------
. ereturn listing
scalars:
e(N) = 62
macros:
e(cmd) : "myregress5"
e(properties) : "b V"
matrices:
e(b) : 1 x 4
e(V) : 4 x 4
capabilities:
e(pattern)
Permitting for issue variables
Instance 1 consists of the variety of repairs as a steady variable, nevertheless it may be higher handled as a discrete issue. myregress6 accepts issue variables. Issue-variable lists often suggest variable lists that comprise completely collinear variables, so myregress6 additionally handles
completely collinear variables.
*! model 6.0.0 22Nov2015 program outline myregress6, eclass model 14 syntax varlist(numeric ts fv) [if] [in] marksample touse gettoken depvar : varlist _fv_check_depvar `depvar' tempname zpz xpx xpy xpxi b V tempvar xbhat res res2 quietly matrix accum `zpz' = `varlist' if `touse' native p = colsof(`zpz') matrix `xpx' = `zpz'[2..`p', 2..`p'] matrix `xpy' = `zpz'[2..`p', 1] matrix `xpxi' = syminv(`xpx') native okay = `p' - diag0cnt(`xpxi') - 1 matrix `b' = (`xpxi'*`xpy')' quietly matrix rating double `xbhat' = `b' if `touse' quietly generate double `res' = (`depvar' - `xbhat') if `touse' quietly generate double `res2' = (`res')^2 if `touse' quietly summarize `res2' if `touse' , meanonly native N = r(N) native sum = r(sum) native s2 = `sum'/(`N'-(`okay')) matrix `V' = `s2'*`xpxi' ereturn publish `b' `V', esample(`touse') buildfvinfo ereturn scalar N = `N' ereturn scalar rank = `okay' ereturn native cmd "myregress6" ereturn show finish
The fv within the parentheses after varlist within the syntax command in line 5 modifies the varlist to simply accept issue variables. Any specified issue variables are saved within the native macro varlist in a canonical type.
Estimation instructions don’t enable the dependent variable to be an element variable. The _fv_check_depvar command in line 9 will exit with an error if the native macro depvar incorporates an element variable.
Line 15 shops the variety of columns within the matrix shaped by matrix accum within the native macro p. Line 19 shops the variety of linearly unbiased columns within the native macro okay. This calculation makes use of diag0cnt() to account for the superbly collinear variables that have been dropped. (Every dropped variable places a zero on the diagonal of the generalized inverse calculated by syminv() and diag0cnt() returns the variety of zeros on the diagonal.)
On line 29, I specified choice buildfvinfo on ereturn publish to retailer hidden info that ereturn show, distinction, margins, and pwcompare use to label tables and to determine which capabilities of the parameters are estimable.
Line 31 shops the variety of linearly unbiased variables in e(rank) for postestimation instructions.
Now, I take advantage of myregress6 to incorporate rep78 as a factor-operated variable. The bottom class is dropped as a result of we included a continuing time period.
Instance 2: myregress6 with a factor-operated variable
. myregress6 worth mpg trunk i.rep78
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg | -262.7053 73.49434 -3.57 0.000 -406.7516 -118.6591
trunk | 41.75706 93.9671 0.44 0.657 -142.4151 225.9292
|
rep78 |
2 | 654.7905 2136.246 0.31 0.759 -3532.175 4841.756
3 | 1170.606 2001.739 0.58 0.559 -2752.73 5093.941
4 | 1473.352 2017.138 0.73 0.465 -2480.167 5426.87
5 | 2896.888 2121.206 1.37 0.172 -1260.599 7054.375
|
_cons | 9726.377 2790.009 3.49 0.000 4258.06 15194.69
------------------------------------------------------------------------------
Executed and undone
I modified the OLS command mentioned in Programming an estimation command in Stata: A greater OLS command to permit for pattern restrictions, to deal with lacking values, to permit for issue variables, and to cope with completely collinear variables. Within the subsequent publish, I present tips on how to enable choices for sturdy commonplace errors and to suppress the fixed time period.
