I focus on a command that computes strange least-squares (OLS) ends in Mata, paying particular consideration to the construction of Stata packages that use Mata work capabilities.
This command builds on a number of earlier posts; at a minimal, you have to be aware of Programming an estimation command in Stata: A primary ado-command utilizing Mata and Programming an estimation command in Stata: Computing OLS objects in Mata.
That is the fifteenth submit within the collection Programming an estimation command in Stata. I like to recommend that you simply begin in the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this collection.
An OLS command with Mata computations
The Stata command myregress11 computes the ends in Mata. The syntax of the myregress11 command is
myregress11 depvar [indepvars] [if] [in] [, noconstant]
the place indepvars can include issue variables or time-series variables.
Within the the rest of this submit, I focus on the code for myregress11.ado. I like to recommend that you simply click on on the file identify to obtain the code. To keep away from scrolling, view the code within the do-file editor, or your favourite textual content editor, to see the road numbers.
I don’t focus on the formulation for the OLS objects. See Programming an estimation command in Stata: Computing OLS objects in Mata for the formulation and Mata implementations.
*! model 11.0.0 11Jan2016
program outline myregress11, eclass sortpreserve
model 14
syntax varlist(numeric ts fv) [if] [in] [, noCONStant ]
marksample touse
gettoken depvar indepvars : varlist
_fv_check_depvar `depvar'
fvexpand `indepvars'
native cnames `r(varlist)'
tempname b V N rank df_r
mata: mywork("`depvar'", "`cnames'", "`touse'", "`fixed'", ///
"`b'", "`V'", "`N'", "`rank'", "`df_r'")
if "`fixed'" == "" {
native cnames `cnames' _cons
}
matrix colnames `b' = `cnames'
matrix colnames `V' = `cnames'
matrix rownames `V' = `cnames'
ereturn submit `b' `V', esample(`touse') buildfvinfo
ereturn scalar N = `N'
ereturn scalar rank = `rank'
ereturn scalar df_r = `df_r'
ereturn native cmd "myregress11"
ereturn show
finish
mata:
void mywork( string scalar depvar, string scalar indepvars,
string scalar touse, string scalar fixed,
string scalar bname, string scalar Vname,
string scalar nname, string scalar rname,
string scalar dfrname)
{
actual vector y, b, e, e2
actual matrix X, XpXi
actual scalar n, ok
y = st_data(., depvar, touse)
X = st_data(., indepvars, touse)
n = rows(X)
if (fixed == "") {
X = X,J(n,1,1)
}
XpXi = quadcross(X, X)
XpXi = invsym(XpXi)
b = XpXi*quadcross(X, y)
e = y - X*b
e2 = e:^2
ok = cols(X) - diag0cnt(XpXi)
V = (quadsum(e2)/(n-k))*XpXi
st_matrix(bname, b')
st_matrix(Vname, V)
st_numscalar(nname, n)
st_numscalar(rname, ok)
st_numscalar(dfrname, n-k)
}
finish
Let’s break this 74-line program into acquainted items to make it simpler to grasp. Strains 2–35 outline the ado-command, and contours 37–74 outline the Mata work operate that’s utilized by the ado-command. Though there are extra particulars, I used this construction in mymean8.ado, which I mentioned in Programming an estimation command in Stata: A primary ado-command utilizing Mata.
The ado-command has 4 components.
- Strains 5–14 parse what the person typed, establish the pattern, and create short-term names for the outcomes returned by our Mata work operate.
- Strains 16-17 name the Mata work operate.
- Strains 19–31 submit the outcomes returned by the Mata work operate to e().
- Line 33 shows the outcomes.
The Mata operate mywork() additionally has 4 components.
- Strains 39–43 parse the arguments.
- Strains 46–48 declare vectors, matrices, and scalars which can be native to mywork().
- Strains 54–64 compute the outcomes.
- Strains 66–70 copy the computed outcomes to Stata, utilizing the names that had been handed in arguments.
Now let’s focus on the ado-code in some element. Line 5 makes use of the syntax command to place the names of the variables specified by the person into the native macro varlist, to parse an non-obligatory if restriction, to parse an non-obligatory in restriction, and to parse the noconstant choice. The variables specified by the person should be numeric, might include time-series variables, and should include issue variables. For extra detailed discussions of comparable syntax usages, see Programming an estimation command in Stata: Permitting for pattern restrictions and issue variables, Programming an estimation command in Stata: Permitting for choices, and Programming an estimation command in Stata: Utilizing a subroutine to parse a fancy choice.
Line 6 creates a short lived sample-identification variable and shops its identify within the native macro touse. I mentioned this utilization intimately within the part surrounding code block 1 in Programming an estimation command in Stata: Permitting for pattern restrictions and issue variables.
Line 8 makes use of gettoken to retailer the identify of the dependent variable within the native macro depvar and the names of the unbiased variables within the native macro indepvars. Line 9 makes use of _fv_check_depvar to verify that the identify of the dependent variable just isn’t an element variable.
Line 11 makes use of fvexpand to increase the issue variables in indepvars. Line 12 places the expanded names saved in r(varlist) by fvexpand within the native macro cnames. A single issue variable can indicate multiple coefficient. fvexpand finds the canonical names for these coefficients and returns them in r(varlist). I’ve not used fvexpand till now as a result of the Stata instructions that I used to compute the outcomes robotically created the coefficient names. Mata capabilities are designed for velocity, so I need to create the coefficient names once I use them.
Instance 1 illustrates how one issue variable can indicate multiple coefficient.
Instance 1: fvexpand
. sysuse auto
(1978 Car Information)
. tabulate rep78
Restore |
Document 1978 | Freq. P.c Cum.
------------+-----------------------------------
1 | 2 2.90 2.90
2 | 8 11.59 14.49
3 | 30 43.48 57.97
4 | 18 26.09 84.06
5 | 11 15.94 100.00
------------+-----------------------------------
Complete | 69 100.00
. fvexpand i.rep78
. return checklist
macros:
r(fvops) : "true"
r(varlist) : "1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78"
. summarize 2.rep78
Variable | Obs Imply Std. Dev. Min Max
-------------+---------------------------------------------------------
2.rep78 | 69 .115942 .3225009 0 1
The tabulate outcomes present that there are 5 ranges in rep78. fvexpand finds the degrees and creates a listing of the names of the implied indicator variables 1b.rep78, 2.rep78, 3.rep78, 4.rep78, and 5.rep78. Evaluating the outcomes from summarize 2.rep78 and tabulate rep78 illustrates this notation. The b in 1b.rep78 identifies degree 1 as the bottom class to be omitted when there’s a fixed within the mannequin. Kind assist fvvarlist for extra particulars.
Line 14 creates the short-term names for the outcomes. For instance, it shops a secure, short-term identify within the native macro b that can be utilized for the matrix storing the purpose estimates. I mentioned this utilization within the part Utilizing short-term names for world objects in Programming an estimation command in Stata: A primary ado-command utilizing Mata.
Strains 16 and 17 name the Mata operate mywork(), which makes use of the data contained within the native macros depvar, cnames, touse, and fixed to compute the outcomes which can be returned within the Stata objects whose names are saved within the native macros b, V, N, rank, and df_r.
Line 20 appends _cons to the native macro cnames, if the person specified the choice noconstant.
Strains 23–25 put row names on the vector of level estimates and row and column names on the matrix containing the estimated variance-covariance of the estimator (VCE).
Strains 27–31 submit the outcomes to e().
Line 33 shows a normal Stata output desk, utilizing the ends in e(b), e(V), and e(df_r).
Observe that the native macro b created on line 14 incorporates a short lived identify that’s handed to mywork() on line 17 and that the Stata matrix whose identify is contained within the native macro b is used on strains 23 and 27. mywork() places the vector of level estimates within the Stata matrix whose identify is saved within the native macro b. Additionally word that the native macro V created on line 14 incorporates a short lived identify that’s handed to mywork() on line 17 and that the Stata matrix whose identify is contained within the native macro V is used on strains 24, 25, and 27. mywork() places the estimated VCE within the Stata matrix whose identify is saved within the native macro V.
To see how this works, let’s focus on the mywork() operate intimately. Strains 39–43 declare that mywork() returns nothing, it’s void, and declare that mywork() accepts 9 arguments, every of which is a string scalar. The primary 4 arguments are inputs; depvar incorporates the identify of the unbiased variable, indepvars incorporates the names of the unbiased variables, touse incorporates the identify of the sample-identification variable, and fixed incorporates both noconstant or is empty. The values of those arguments are used on strains 50, 51, and 54 to create the vector y and the matrix X.
The final 5 arguments include names used to put in writing the outcomes again to Stata. mywork() writes the outcomes again to Stata utilizing the passed-in short-term names. For instance, line 17 reveals that the Mata string scalar bname incorporates the short-term identify saved within the native macro b. Line 66 copies the outcomes saved within the transpose of the Mata vector b to a Stata matrix whose identify is saved within the Mata string scalar bname. (Line 60 reveals that the vector b incorporates the OLS level estimates.) Strains 23 and 27 then use this Stata vector whose identify is saved within the native macro b. Equally, line 17 reveals that the Mata string scalar Vname incorporates the short-term identify saved within the native macro V. Line 67 copies the outcomes saved within the Mata matrix V to a Stata matrix whose identify is saved within the Mata string scalar Vname. (Line 64 reveals that V incorporates the estimated VCE.) Strains 24, 25, and 27 then use this Stata matrix whose identify is saved within the native macro V. The arguments nname, rname, and dfrname are used analogously to return the outcomes for the variety of observations, the rank of the VCE, and the levels of freedom of the residuals.
Strains 50–64 compute the purpose estimates and the VCE. Apart from line 54, I mentioned these computations in Programming an estimation command in Stata: A primary ado-command utilizing Mata. Line 54 causes a column of 1s to be joined to the covariate matrix X when the string scalar fixed is empty. Strains 5 and 16 indicate that the Mata string scalar fixed incorporates noconstant when the person specifies the noconstant choice and that it’s empty in any other case.
Finished and undone
I mentioned the code for myregress11.ado, which makes use of Mata to compute OLS level estimates and a VCE that assumes unbiased and identically distributed observations. The construction of the code is identical because the one which I utilized in mymean7.ado and mymean8.ado, mentioned in Programming an estimation command in Stata: A primary ado-command utilizing Mata, though there are extra particulars within the OLS program.
Key to this construction is that the Mata work operate accepts two forms of arguments: the names of Stata objects which can be inputs and short-term names which can be used to put in writing the outcomes again to Stata from Mata.
Within the subsequent submit, I lengthen myregress11.ado to permit for strong or cluster-robust estimators of the VCE.

































