Distributing a Stata command that implements a statistical methodology will get that methodology utilized by numerous individuals. They are going to thanks. And, they are going to cite you!
This put up is the primary within the sequence #StataProgramming about programing an estimation command in Stata that makes use of Mata to do the numerical work. Within the technique of displaying you easy methods to program an estimation command in Stata, I’ll focus on do-file programming, ado-file programming, and Mata programming. When the sequence ends, it is possible for you to to jot down Stata instructions.
Stata customers like its predictable syntax and its estimation-postestimation construction that facilitates speculation testing, specification assessments, and parameter interpretation. That can assist you write Stata instructions that individuals need to use, I illustrate how Stata syntax is predictable and provides an summary of the estimation-postestimation construction that it would be best to emulate in your applications.
Stata construction by instance
I use and describe some simulated knowledge in regards to the variety of site visitors accidents noticed on 948 individuals.
Instance 1: Accident knowledge
. use http://www.stata.com/knowledge/accident2.dta
. describe
Accommodates knowledge from http://www.stata.com/knowledge/accident2.dta
obs: 948
vars: 6 23 Sep 2015 13:04
measurement: 22,752
--------------------------------------------------------------------------------
storage show worth
variable identify sort format label variable label
--------------------------------------------------------------------------------
youngsters float %9.0g variety of kids
cvalue float %9.0g automobile worth index
tickets float %9.0g variety of tickets in final 2 years
site visitors float %9.0g native site visitors index, bigger=>worse
male float %9.0g 1=>man, 0=>girl
accidents float %9.0g variety of site visitors in final 5 years
--------------------------------------------------------------------------------
Sorted by:
Stata’s predictable syntax
I estimate the parameters of a Poisson regression mannequin for accidents as a operate of site visitors situations (site visitors), an indicator for being a male driver (male), and the variety of tickets obtained within the final two years (tickets).
Instance 2: A Poisson mannequin for accidents
. poisson accidents site visitors male tickets , vce(strong)
Iteration 0: log pseudolikelihood = -377.98594
Iteration 1: log pseudolikelihood = -370.68001
Iteration 2: log pseudolikelihood = -370.66527
Iteration 3: log pseudolikelihood = -370.66527
Poisson regression Variety of obs = 948
Wald chi2(3) = 1798.65
Prob > chi2 = 0.0000
Log pseudolikelihood = -370.66527 Pseudo R2 = 0.8191
------------------------------------------------------------------------------
| Sturdy
accidents | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
site visitors | .0764399 .0165119 4.63 0.000 .0440772 .1088027
male | 3.228004 .1232081 26.20 0.000 2.986521 3.469488
tickets | 1.366614 .0328218 41.64 0.000 1.302284 1.430943
_cons | -7.434478 .2413188 -30.81 0.000 -7.907454 -6.961502
------------------------------------------------------------------------------
I need to give attention to the construction on this instance as a way to use it to make your instructions simpler to make use of. Specifically, I need to focus on the construction of the command syntax and to level out that the output is simple to learn and interpret as a result of it’s a customary Stata output desk. For estimators that desk virtually all the time studies estimates (typically coefficients), customary errors, assessments towards zero and their $p$-values, and confidence intervals.
Stata syntax is predictable, which makes it straightforward to make use of. Stata customers “converse Stata” and don’t even discover the small print. I spotlight a few of these particulars in order that we are able to make the syntax of the instructions we write predictable. Listed here are a few of the customary syntax components illustrated in instance 2.
- The command has 4 syntactical components;
- command identify (poisson),
- record of variable names (accidents site visitors male tickets),
- a comma,
- an choice (vce(strong)).
- Within the record of variable names, the identify of the dependent variable is first and it’s adopted by the names of the impartial variables.
- The job of the comma is to separate the command identify and variable record from the choice or choices.
The output can also be structured; it’s composed of an iteration log, a header, and a regular output desk.
Estimation-postestimation framework
As a Stata person, I might now use the estimation-postestimation framework. For instance, I might carry out a Wald check of the speculation that the coefficient on male is 3.
Instance 3: A Wald check of a linear restriction
. check male = 3
( 1) [accidents]male = 3
chi2( 1) = 3.42
Prob > chi2 = 0.0642
or I might carry out a Wald check of the nonlinear speculation that the ratio of the coefficient on male to the ratio of the coefficient on tickets is 2.
Instance 4: A Wald check of a nonlinear restriction
. testnl _b[male]/_b[tickets] = 2
(1) _b[male]/_b[tickets] = 2
chi2(1) = 19.65
Prob > chi2 = 0.0000
I might additionally predict the imply of accidents for every remark and summarize the outcomes.
Instance 5: Summarizing the anticipated conditional means
. predict nhat
(choice n assumed; predicted variety of occasions)
. summarize nhat
Variable | Obs Imply Std. Dev. Min Max
-------------+---------------------------------------------------------
nhat | 948 .8512658 2.971087 .0006086 29.0763
Lastly, I might use margins to estimate conditional or population-averaged parameters which are features of the parameters within the unique mannequin. I take advantage of margins to estimate the common variety of accidents that may be noticed if every particular person obtained 0 tickets, or 1 ticket, or 2 tickets, …, or 7 tickets. See [R] margins, Lengthy and Freese (2006, sec. 4.4.2-4.4.3), and Cameron and Trivedi (2010, 10.5.6{10.6.9) for introductions to estimating features of the the mannequin parameters by margins.
Instance 6: Estimating features of mannequin parameters
. margins, at(tickets=(0 1 2 3 4 5 6 7))
Predictive margins Variety of obs = 948
Mannequin VCE : Sturdy
Expression : Predicted variety of occasions, predict()
1._at : tickets = 0
2._at : tickets = 1
3._at : tickets = 2
4._at : tickets = 3
5._at : tickets = 4
6._at : tickets = 5
7._at : tickets = 6
8._at : tickets = 7
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | .0097252 .0015387 6.32 0.000 .0067094 .012741
2 | .0381426 .0048762 7.82 0.000 .0285854 .0476998
3 | .1495971 .0148157 10.10 0.000 .120559 .1786353
4 | .5867272 .0432256 13.57 0.000 .5020066 .6714478
5 | 2.301172 .1302033 17.67 0.000 2.045978 2.556366
6 | 9.025308 .5049176 17.87 0.000 8.035688 10.01493
7 | 35.39769 2.555679 13.85 0.000 30.38865 40.40673
8 | 138.8315 13.49606 10.29 0.000 112.3797 165.2832
------------------------------------------------------------------------------
The glue
The estimation outcomes saved in e() are the glue that holds collectively the estimation-postestimation framework. The poisson command shops numerous stuff in e(). I might use ereturn record to record all these items, however there are lots of saved objects that don’t curiosity you but.
Many of the estimation-postestimation options that I mentioned had been applied utilizing e(b), e(V), and e(predict), that are the vector of level estimates, the estimated VCE, and the identify of the command that implements predict after poisson.
I’ll present easy methods to retailer what you want in e() within the #StataProgramming sequence.
Construction of Stata instructions
Right here is a top level view of the duties carried out by a Stata estimation command.
- Parse the enter to the command.
- Compute outcomes.
- Retailer ends in e()
- Show output.
It’s good to write a predict command to finish the estimation-postestimation framework. After you will have saved the estimation outcomes and written the predict command, margins works.
I’ll clarify every of those steps within the #StataProgramming sequence of posts.
Use this construction to your benefit. To make your command straightforward to make use of, design it to have the predictable syntax applied in different instructions and make it work within the estimation-postestimation framework. This job is way simpler than it sounds. In truth, it’s simply plain straightforward. The Stata language steers you on this route.
Finished and undone
I’ll educate you easy methods to program an estimation command in Stata within the #StataProgramming sequence. I can even present you ways do the numerical work in Mata. I mentioned the next factors, on this first put up.
- The predictable construction of Stata syntax makes Stata straightforward to make use of. It’s best to emulate this construction, in order that your instructions are straightforward to make use of.
- The estimation-postestimation framework makes inference and superior estimation easy. It’s straightforward so that you can make your command work with this framework.
- The estimation outcomes saved in e(), and the predict command, are the glue that holds the estimation-postestimation framework collectively.
Within the subsequent put up, I focus on do-file programming instruments that I’ll subsequently use to parse the enter to the command.
References
Cameron, A. C., and P. Okay. Trivedi. 2010. Microeconometrics Utilizing Stata. Revised ed. Faculty Station, Texas: Stata Press.
Lengthy, J. S., and J. Freese. 2014. Regression fashions for categorical dependent variables utilizing Stata. third ed. Faculty Station, Texas: Stata Press.
