I focus on the code for a easy estimation command to concentrate on the main points of the right way to implement an estimation command. The command that I focus on estimates the imply by the pattern common. I start by reviewing the formulation and a do-file that implements them. I subsequently introduce ado-file programming and focus on two variations of the command. Alongside the way in which, I illustrate among the postestimation options that work after the command.
That is the fourth submit within the sequence Programming an estimation command in Stata. I like to recommend that you simply begin initially. See Programming an estimation command in Stata: A map to posted entries for a map to all of the posts on this sequence.
The formulation for our estimator
The formulation for the pattern common and its estimated sampling variance, assuming an independently and identically distributed course of, are
[
widehat{mu} = 1/N sum_{i=1}^N y_i
]
[
widehat{Var}(widehat{mu}) = 1/[N(N-1)] sum_{i=1}^N (y_i-widehat{mu})^2
]
The code mean1.do performs these computations on worth from the auto dataset.
// model 1.0.0 20Oct2015 model 14 sysuse auto quietly summarize worth native sum = r(sum) native N = r(N) native mu = (1/`N')*`sum' generate double e2 = (worth - `mu')^2 quietly summarize e2 native V = (1/((`N')*(`N'-1)))*r(sum) show "muhat = " `mu' show " V = " `V'
mean1.do makes use of summarize to compute the summations. Traces 5–7 and line 11 retailer outcomes saved by summarize in r() into native macros which are subsequently used to compute the formulation. I like to recommend that you simply use double, as an alternative of the default float, to compute all variables utilized in your formulation as a result of it’s nearly at all times value taking over the additional reminiscence to realize the additional precision provided by double over float. (Primarily, every variable takes up twice as a lot area, however you get calculations which are right to about (10^{-16}) as an alternative of (10^{-8}).)
These calculations yield
Instance 1: Computing the common and its sampling variance
. do mean1 . // model 1.0.0 20Oct2015 . model 14 . sysuse auto (1978 Car Knowledge) . quietly summarize worth . native sum = r(sum) . native N = r(N) . native mu = (1/`N')*`sum' . generate double e2 = (worth - `mu')^2 . quietly summarize e2 . native V = (1/((`N')*(`N'-1)))*r(sum) . show "muhat = " `mu' muhat = 6165.2568 . show " V = " `V' V = 117561.16 . finish of do-file
Now I confirm that imply produces the identical outcomes.
Instance 2: Outcomes from imply
. imply worth
Imply estimation Variety of obs = 74
--------------------------------------------------------------
| Imply Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
worth | 6165.257 342.8719 5481.914 6848.6
--------------------------------------------------------------
. matrix listing e(b)
symmetric e(b)[1,1]
worth
y1 6165.2568
. matrix listing e(V)
symmetric e(V)[1,1]
worth
worth 117561.16
A primary ado-file
The code in mymean1.ado performs the identical calculations as mean1.do. (The file mymean1.ado is in my present working listing.)
*! model 1.0.0 20Oct2015 program outline mymean1 model 14 quietly summarize worth native sum = r(sum) native N = r(N) native mu = (1/`N')*`sum' seize drop e2 // Drop e2 if it exists generate double e2 = (worth - `mu')^2 quietly summarize e2 native V = (1/((`N')*(`N'-1)))*r(sum) show "muhat = " `mu' show " V = " `V' finish
Line 1 of mymean1.ado specifies that file defines the command mymean1. The command title should be the identical because the file title that precedes the suffix .ado. The mymean1 command performs the identical computations because the do-file mean1.do.
Instance 3: Outcomes from mymean1
. mymean1 muhat = 6165.2568 V = 117561.16
A barely higher command
We wish our command to be reusable; we wish it to estimate the imply for any variable in reminiscence, as an alternative of just for worth as carried out by mymean1.ado. On line 5 of mymean2.ado, we use the syntax command to retailer the title of the variable specified by the person into the native macro varlist, which we use within the the rest of the computations.
*! model 2.0.0 20Oct2015 program outline mymean2 model 14 syntax varlist show "The native macro varlist incorporates `varlist'" quietly summarize `varlist' native sum = r(sum) native N = r(N) native mu = (1/`N')*`sum' seize drop e2 // Drop e2 if it exists generate double e2 = (`varlist' - `mu')^2 quietly summarize e2 native V = (1/((`N')*(`N'-1)))*r(sum) show "The typical of `varlist' is " `mu' show "The estimated variance of the common is " `V' finish
The extraordinarily highly effective syntax command places the weather of Stata syntax specified by the person into native macros and throws errors when the person makes a mistake. I’ll focus on syntax in larger element in subsequent posts.
I start by illustrating the right way to replicate the earlier outcomes.
Instance 4: Outcomes from mymean2 worth
. mymean2 worth The native macro varlist incorporates worth The typical of worth is 6165.2568 The estimated variance of the common is 117561.16
I now illustrate that it really works for one more variable.
Instance 5: Outcomes from mymean2 trunk
. mymean2 trunk
The native macro varlist incorporates trunk
The typical of trunk is 13.756757
The estimated variance of the common is .24724576
. imply trunk
Imply estimation Variety of obs = 74
--------------------------------------------------------------
| Imply Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
trunk | 13.75676 .4972381 12.76576 14.74775
--------------------------------------------------------------
. show "The variance of the estimator is " (_se[trunk])^2
The variance of the estimator is .24724576
Storing leads to e()
mymean2.ado doesn’t save the outcomes that it shows. We repair this downside in mymean3.ado. Line 2 specifies the choice e-class on program outline to make mymean3 an e-class command. Line 18 makes use of ereturn submit to maneuver the matrix of level estimates (b) and the estimated variance-covariance of the estimator (VCE) into e(b) and e(V). The estimation-postestimation framework makes use of parameter names for show, speculation checks, and different options. In strains 15 and 16, we put these names into the column stripes of the vector of estimates and the estimated VCE. In line 17, we put these names into the row stripe of the estimated VCE.
*! model 3.0.0 20Oct2015 program outline mymean3, eclass model 14 syntax varlist quietly summarize `varlist' native sum = r(sum) native N = r(N) matrix b = (1/`N')*`sum' seize drop e2 // Drop e2 if it exists generate double e2 = (`varlist' - b[1,1])^2 quietly summarize e2 matrix V = (1/((`N')*(`N'-1)))*r(sum) matrix colnames b = `varlist' matrix colnames V = `varlist' matrix rownames V = `varlist' ereturn submit b V ereturn show finish
The ereturn show command on line 19 of mymean3.ado simply creates a normal output desk utilizing the outcomes now saved in e(b) and e(V).
Instance 6: Outcomes from mymean3 trunk
. mymean3 trunk
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trunk | 13.75676 .4972381 27.67 0.000 12.78219 14.73133
------------------------------------------------------------------------------
Estimation-postestimation
take a look at, lincom, testnl, nlcom, and different Wald-based estimation-postestimation options work after mymean3 as a result of all of the required info is saved in e(b) and e(V).
As an instance, I carry out a Wald of the null speculation that the imply of trunk is (11).
Instance 7: take a look at works after mymean3
. take a look at _b[trunk]==11
( 1) trunk = 11
chi2( 1) = 30.74
Prob > chi2 = 0.0000
The outcomes saved in e() are the glue that holds the estimation-postestimation framework collectively. We have now solely saved e(b) and e(V) to this point, so not all the usual options are working but. (However we are going to get there within the #StataProgramming sequence.)
Utilizing non permanent names for international objects
Stata variables and matrices are international, as mentioned in my earlier weblog submit. We want some protected names for international objects. These protected names shouldn’t be in use elsewhere, and they need to be non permanent in that we wish Stata to drop the corresponding objects when the command finishes. The tempvar and tempname instructions put protected names into native macros after which drop the corresponding objects when the ado-file or do-file finishes. We explicitly dropped e2, if it existed, in line 9 of code block 2, in line 12 of code block 3, and in line 11 of code block 4. We don’t want such a line in code block, as a result of we’re utilizing non permanent variable names.
In line 7 of mymean4.ado, the tempvar command places a protected title into the native macro e2. In line 8 of mymean4.ado, the tempname command places protected names into the native macros b and V. I illustrate the format adopted by these protected names by displaying them on strains 9–11. The output reveals {that a} main pair of underscores is adopted by numbers and capital letters. Line 15 illustrates the usage of these protected names. As an alternative of making the matrix b, we create the matrix whose title is saved within the native macro b. In line 8, the tempname command created the native macro b to carry a protected title.
*! model 4.0.0 20Oct2015 program outline mymean4, eclass model 14 syntax varlist tempvar e2 tempname b V show "The protected title in e2 is `e2'" show "The protected title in b is `b'" show "The protected title in V is `V'" quietly summarize `varlist' native sum = r(sum) native N = r(N) matrix `b' = (1/`N')*`sum' generate double `e2' = (`varlist' - `b'[1,1])^2 quietly summarize `e2' matrix `V' = (1/((`N')*(`N'-1)))*r(sum) matrix colnames `b' = `varlist' matrix colnames `V' = `varlist' matrix rownames `V' = `varlist' ereturn submit `b' `V' ereturn show finish
This code produces the output
Instance 8: Outcomes from mymean4 trunk
. mymean4 trunk
The protected title in e2 is __000000
The protected title in b is __000001
The protected title in V is __000002
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trunk | 13.75676 .4972381 27.67 0.000 12.78219 14.73133
------------------------------------------------------------------------------
Eradicating the strains that show the protected names contained within the native macros yields mymean5.ado.
*! model 5.0.0 20Oct2015 program outline mymean5, eclass model 14 syntax varlist tempvar e2 tempname b V quietly summarize `varlist' native sum = r(sum) native N = r(N) matrix `b' = (1/`N')*`sum' generate double `e2' = (`varlist' - `b'[1,1])^2 quietly summarize `e2' matrix `V' = (1/((`N')*(`N'-1)))*r(sum) matrix colnames `b' = `varlist' matrix colnames `V' = `varlist' matrix rownames `V' = `varlist' ereturn submit `b' `V' ereturn show finish
This code produces the output
Instance 9: Outcomes from mymean5 trunk
. mymean5 trunk
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
trunk | 13.75676 .4972381 27.67 0.000 12.78219 14.73133
------------------------------------------------------------------------------
Performed and undone
I illustrated some fundamental ado-file programming strategies by implementing a command that estimates the imply of variable. Although now we have a command that produces right, easy-to-read output that has some estimation-postestimation options, now we have solely scratched the floor of what we normally wish to do in an estimation command. I dig somewhat deeper within the subsequent few posts by growing a command that performs strange least-squares estimation.
