Saturday, November 29, 2025

Revealed choice: Stata for reproducible analysis


I care about reproducible analysis. Anybody who has ever been a analysis assistant or tried to comply with the trail set by different researchers additionally cares. Generally, reproducing others’ outcomes is a irritating job; generally, it’s outright inconceivable. But generally, it’s satisfyingly easy. In my expertise, reproducing outcomes is simple when it entails a Stata do-file. I imagine that is true even past my private bias (I work for Stata and used the software program recurrently earlier than that). A current article revealed by the American Financial Affiliation (AEA), Vilhuber, Turrito, and Welch (2020), exhibits that Stata is the popular bundle amongst economists, and I imagine reproducibility is a giant cause why.

The AEA established reproducibility tips in 2008. Not too long ago, it up to date its tips to require authors not solely to make knowledge and evaluation obtainable but additionally to offer the code used to wash the info and the uncooked knowledge, each time possible. Now, the editorial course of consists of an AEA knowledge editor who verifies that the knowledge supplied by the authors is adequate to duplicate the ends in the paper.

Vilhuber, Turrito, and Welch (2020) present that for the reason that inception of the coverage, Stata has been utilized in 73% of the dietary supplements supplied by the authors. The utilization has been rising over the span of the coverage. The graph beneath exhibits the proportion of knowledge dietary supplements wherein totally different software program packages are used. These percentages might add as much as greater than 100% as a result of content material from a couple of software program bundle could also be submitted in every complement.

Determine 1: Proportion of software program utilization by 12 months in AEA dietary supplements

This isn’t a shock to anybody who has used Stata. I imagine one essential cause researchers select Stata is that reproducing your outcomes is simple. Living proof is the graph above. To get the info and reproduce the graph, you simply have to run the do-file, which I talk about in Appendix I. If you wish to create a reproducible report, see my dialogue in Appendix II.

The do-file beneath primarily makes use of three instructions: import delimited, which I exploit to get the comma-separated worth dataset used for the graph; xtline, which I exploit to generate the graph; and egen, which helps me to generate a numeric categorical variable from a string variable utilizing the group operate. The opposite instructions I exploit simplify readability, enable me to change the code, and assist me show outcomes.

Line 1 is there for reproducibility. Stata is the one bundle I’m conscious of with built-in model management making certain that scripts written as way back as 2008, and certainly, even earlier, can nonetheless be used to reproduce their outcomes within the trendy model of the software program. Strains 5 to 7 create locals for the situation of the information. I exploit them in my name to import delimited. Line 16 creates a categorical variable from a string variable. Every class corresponds to a software program bundle. Within the subsequent line, I hold a subset of the info. The information I hold correspond to the time interval for the reason that AEA carried out its knowledge coverage. In line 21, I exploit xtset to declare knowledge as a panel after which to have the ability to use xtline. In line 22, I modify the default Stata graphic scheme. Stata has a number of schemes, and you would even write one which greatest represents your preferences. I just like the simplicity of s1color. The remaining code reproduces determine 1.

model 16

// Simplyfing readability and permitting for probably different knowledge downloads

native  the place "https://uncooked.githubusercontent.com/AEADataEditor"
native  folder "aea-supplement-migration/grasp/knowledge/generated"
native knowledge "desk.aea.software_by_year.csv"

// Importing dataset

import delimited utilizing "`the place'/`folder'/`knowledge'",   ///
clear varnames(1) colrange(2:9)

// Producing a numeric variable to graph

egen software_num = group(software_collapsed), label
hold if 12 months > 2007

// Graphing

xtset software_num 12 months
set scheme s1color
xtline %, overlay legend(place(10) ring(0) rows(2)         ///
   area(lstyle(none)) order(9 3 4 7 5 6 8 1 2))                   ///
   ytitle("") xtitle("") xlabel(2008(3)2017 2019)                   ///
   ylabel(0(20)100)  plotregion(margin(medium))                     ///
   plot9(lcolor(blue))                                              ///
   observe("Notice: A complement can be utilized in a couple of software program." ///
   "Notice: Information for 2019 ends in October.")

You may create a reproducible report in Phrase, Excel, PDF, and HTML utilizing Stata. By reproducible, I imply you could run a do-file that produces Stata outcomes and outputs them into the format you need.

I need to create an HTML doc. I save a do-file as a textual content file and provides it the identify aea.txt. Then, I kind dyndoc “aea.txt”, exchange. I generate an HTML doc. Apart from aea.txt, I exploit two different information to provide my HTML doc. First, I create a header.txt file. The header.txt file incorporates HTML code to incorporate on the prime of the goal HTML file. The header refers back to the second file, stmarkdown.css, which is a method sheet that defines how the HTML doc is to be formatted.

Now that I’ve my HTML paperwork, with Stata’s reporting instruments, I can migrate from HTML to Phrase utilizing one line of code, html2docx aea.html, saving(aea.docx) exchange, and from Phrase to PDF, once more, with one line of code, docx2pdf aea.docx, saving (aea.pdf) exchange.

Vilhuber, L., J. Turrito, and Okay. Welch. 2020. Report by the AEA Information Editor. AEA Papers and Proceedings 110: 764–775. https://doi.org/10.1257/pandp.110.764.



Related Articles

Latest Articles