I noticed a tweet the opposite day the place somebody claimed that StataCorp ensures that the dataset format in Stata X is all the time completely different from Stata X-1.
This jogged my memory of an e-mail I wrote a couple of years in the past to a consumer who had questions on backward compatibility and reproducibility. I’m going to make use of giant components of that e-mail on this weblog put up to share my ideas on these matters.
I perceive the frustration of incompatibilities between software program variations. Whereas it might not ease the inevitable difficulties that come up, I want to clarify our efforts on this regard.
There are two ideas for a software program developer to contemplate with respect to compatibility—ahead compatibility and backward compatibility. Ahead compatibility is all however not possible to realize, whereas backward compatibility is achievable with some effort.
Concerning the tweet about dataset codecs, nothing could possibly be farther from the reality. Stata 16, Stata 15, and Stata 14 share the identical dataset format, so there must be no points with compatibility of datasets between the three most up-to-date variations of Stata. Furthermore, as painful as a dataset format change is for customers, I can promise you it causes our builders and testers much more ache. We attempt to not change our format except we completely need to, each to your sake and for our personal.
Though I’m biased in fact, I imagine Stata has one of the best backward compatibility document of any statistical software program. Stata is the one statistical software program bundle, business or open supply, that I’m conscious of that has a powerful built-in model management system to permit scripts and packages written years in the past to proceed to work in fashionable variations of the software program.
You may take a do-file written, say, nearly 30 years in the past in Stata 3, and so long as that do-file is marked with “model 3” on the high, it may be run, as-is, with no modification, in a contemporary Stata 16. No damaged scripts. No damaged packages. No extra effort. Stata was designed from its very first model with reproducible analysis in thoughts, and we would like customers to be assured that years down the highway, the information they used to supply a specific evaluation will proceed to work even when they alter working methods or laptop structure and transfer to a a lot newer model of Stata.
You don’t need to preserve a number of installations of outdated variations of Stata, hoping they are going to nonetheless run on a contemporary working system, to have the ability to run code from years or many years earlier than. You may merely use a contemporary Stata, and it’ll perceive any outdated code or dataset from the previous.
Concerning dataset codecs, we’ve all the time adopted three necessary rules:
1. By no means change Stata’s dataset format except we completely need to.
Simply because a brand new Stata model comes out doesn’t imply that we alter the dataset format. We solely change the dataset format whether it is completely essential to assist some new characteristic of the brand new model. That is why Stata 16 shares the identical dataset format as Stata 15 and Stata 14; there have been no adjustments to Stata capabilities that required a format change.
2. All the time have full backward compatibility and cross-platform compatibility.
When Stata 16 got here out, it needed to be licensed to learn each dataset format Stata ever produced, all the best way again to Stata 1. A contemporary Stata should all the time be capable of learn any dataset produced by any older Stata. As well as, Stata on Home windows, Stata on Mac, Stata on Linux (all of that are presently 64-bit methods), and Stata on another or future {hardware} platform or working system should be capable of learn datasets created on another {hardware} platform or working system, together with older legacy 32-bit methods.
We would like a researcher who did the unique evaluation for a journal article again in Stata 4 on Home windows 3.1 to have the ability to run their evaluation and cargo their datasets on an up-to-date Stata 16 operating on 64-bit Mac OS or another working system we assist.
3. When potential, present ahead compatibility for no less than the second most up-to-date Stata model.
We do that in two methods, the primary of which is all the time potential. If a model of Stata requires a dataset format change, such because the dataset format change that was vital again in Stata 14 for its Unicode functionality, make it possible for model of Stata can save the dataset in reminiscence into no less than the earlier format. We do that with the “saveold” command. In Stata 14, 15, and 16, we took this a bit additional in order that the “saveold” command can write datasets not simply in Stata 13 format, however in numerous codecs understood by Stata all the best way again to Stata 11.
The second approach to supply ahead compatibility is, when potential, to make the final replace to the earlier model of Stata capable of learn datasets created by the latest model of Stata. For instance, the final free replace we launched to Stata 11 included the flexibility to learn Stata 12 format datasets.
We take reproducibility and compatibility—ahead, backward, and cross-platform—severely.
