Friday, December 12, 2025

Stata 16 Launched – The Stata Weblog


We simply introduced the discharge of Stata 16. It’s now out there. Click on to go to stata.com/new-in-stata.

Stata 16 is a giant launch, which our releases normally are. This one is broader than normal. It ranges from lasso to Python and from a number of datasets in reminiscence to a number of chains in Bayesian evaluation.

The highlights are listed beneath. In the event you click on on a spotlight, we’ll spirit you away to our web site, the place we’ll describe the function in a dry however information-dense manner. Or you possibly can scroll down and skim my feedback, which I hope are extra entertaining even when they’re much less informative.

The massive options of Stata 16 are

  1. Lasso, each for prediction and for inference
  2. Reproducible and routinely updating reviews
  3. New meta-analysis suite
  4. Revamped and expanded selection modeling (margins works in all places)
  5. Integration of Python with Stata
  6. Bayesian predictions, a number of chains, and extra
  7. Prolonged regression fashions (ERMs) for panel knowledge
  8. Importing of SAS and SPSS datasets
  9. Versatile nonparametric collection regression
  10. A number of datasets in reminiscence, that means frames
  11. Pattern-size evaluation for confidence intervals
  12. Nonlinear DSGE fashions
  13. A number of-group IRT
  14. Panel-data Heckman-selection fashions
  15. NLMEs with lags: multiple-dose pharmacokinetic fashions and extra
  16. Heteroskedastic ordered probit
  17. Graph sizes in inches, centimeters, and printer factors
  18. Numerical integration in Mata
  19. Linear programming in Mata
  20. Do-file Editor: Autocompletion, syntax highlighting, and extra
  21. Stata for Mac: Darkish Mode and tabbed home windows
  22. Set matsize obviated

Quantity 22 will not be a hyperlink as a result of it’s not a spotlight. I added it as a result of I think it should have an effect on essentially the most Stata customers. It is probably not sufficient to make you purchase the discharge, however it should half tempt you. Purchase the replace, and you’ll by no means once more must kind

. set matsize 600

And for those who do kind it, you’ll be ignored. Stata simply works, and it makes use of much less reminiscence.

Oh, and in Stata/MP, Stata matrices can now be as much as 65,534 x 65,534, that means you possibly can match fashions with over 65,000 right-hand-side variables. In the meantime, Mata matrices stay restricted solely by reminiscence.

Listed here are my feedback on the highlights.

1. Lasso, each for prediction and for inference

There are two components to our implementation of lasso: prediction and inference. I think inference shall be of extra curiosity to our customers, however we would have liked prediction to implement inference. By the way in which, after I say lasso, I imply lasso, elastic web, and square-root lasso, however in order for you a options checklist, click on the title.

Let’s begin with lasso for prediction. In the event you kind

. lasso linear y x1 x2 x3 ... x999

lasso will choose the covariates from the x‘s specified and match the mannequin on them. lasso shall be unlikely to decide on the covariates that belong within the true mannequin, however it should select covariates which might be collinear with them, and that works a deal with for prediction. If English will not be your first language, by “works a deal with”, I imply nice. Anyway, the lasso command is for prediction, and normal errors for the covariates it selects will not be reported as a result of they’d be deceptive.

Regarding inference, we offer 4 lasso-based strategies: double choice, cross-fit partialing out, and two extra. In the event you kind

. dsregress y x1, controls(x2-x999)

then, conceptually however not truly, y shall be match on x1 and the variables lasso selects from x2-x999. That’s not how the calculation is made as a result of the variables lasso selects will not be an identical to the true variables that belong within the mannequin. I mentioned earlier that they’re correlated with the true variables, and they’re. One other manner to consider choice is that lasso estimates the variables to be chosen and, as with all estimation, that’s topic to error. Anyway, the inference calculations are sturdy to these errors. Reported would be the coefficient and its normal error for x1. I specified one variable of particular curiosity within the instance, however you possibly can specify nonetheless many you want.

2. Reproducible and routinely updating reviews

The inelegant title above is making an attempt to say (1) reviews that reproduce themselves simply as they have been initially and (2) reviews that, when run once more, replace themselves by working the evaluation on the most recent knowledge. Stata has at all times been robust on each, and now we have added extra options. I don’t wish to downplay the additions, however neither do I wish to talk about them. Click on the title to study them.

I feel what’s essential is one other facet of what we did. The actual drawback was that we by no means informed you methods to use the reporting options. Now we do in an all-new handbook. We inform you and we present you, with examples and workflows. Right here’s a hyperlink to the handbook so you possibly can decide for your self.

3. New meta-analysis suite

Stata is understood for its community-contributed meta-analysis. Now there’s an official StataCorp suite as nicely. It’s full and straightforward to make use of. And sure, it has funnel plots and forest plots, and bubble plots and L’Abbé plots.

4. Revamped and expanded selection modeling (margins works in all places)

Alternative modeling is jargon for conditional logit, combined logit, multinomial probit, and different procedures that mannequin the likelihood of people making a selected selection from the alternate options out there to every of them.

We added a brand new command to suit combined logit fashions, and we rewrote all the remaining. The brand new instructions are simpler to make use of and have new options. Outdated instructions proceed to work underneath model management.

margins can now be used after becoming any selection mannequin. margins solutions questions on counterfactuals and might even reply them for any one of many alternate options. You possibly can lastly receive solutions to questions like, “How would a $10,000 enhance in revenue have an effect on the likelihood folks take public transportation to work?”

The brand new instructions are simpler to make use of since you should first cmset your knowledge. That won’t sound like a simplification, however it simplifies the syntax of the remaining instructions as a result of it will get particulars out of the way in which. And it has one other benefit. It tells Stata what your knowledge ought to appear like so Stata can run consistency checks and flag potential issues.

Lastly, we created a brand new [CM] Alternative Modeling Handbook. All the things you want to learn about selection modeling can now be present in one place.

5. Integration of Python with Stata

In the event you don’t know what Python is, put down your quill pen, dig out your acoustic modem and plug it in, push your phone handset firmly into the coupler, and go to Wikipedia. Python has turn out to be an exceedingly standard programming language with intensive libraries for writing numerical, machine studying, and internet scraping routines.

Stata’s new relationship with Python is identical as its relationship with Mata. You need to use it interactively from the Stata immediate, in do-files, and in ado-files. You possibly can even put Python subroutines on the backside of ado-files, simply as you do Mata subroutines. Or put each. Stata’s versatile.

Python can entry Stata outcomes and put up outcomes again to Stata utilizing the Stata Operate Interface (sfi), the Python module that we offer.

6. Bayesian predictions, a number of chains, and extra

We have now a number of new Bayesian options.

We now have a number of chains. Has the MCMC converged? Estimate fashions utilizing a number of chains, and reported would be the most of Gelman-Rubin convergence diagnostic. If it has not but converged, do extra simulations. Nonetheless hasn’t converged? Now you possibly can receive the Gelman-Rubin convergence diagnostic for every parameter. If the identical parameter turns up repeatedly because the wrongdoer, you understand the place the issue lies.

We now present Bayesian predictions for outcomes and features of them. Bayesian predictions are calculated from the simulations that have been run to suit your mannequin, so there are loads of them. The predictions shall be saved in a separate dataset. After you have the predictions, we offer instructions so to graph summaries of them and carry out speculation testing. And you should use them to acquire posterior predictive p-values to examine the match of your mannequin.

There’s extra. Click on the title.

7. Prolonged regression fashions (ERMs) for panel knowledge

ERMs matches fashions with issues. These issues might be any mixture of (1) endogenous and exogenous pattern choice, (2) endogenous covariates, also referred to as unobserved confounders, and (3) nonrandom remedy task.

What’s new is that ERMs can now be used to suit fashions with panel (2-level) knowledge. Random results are added to every equation. Correlations between the random results are reported. You possibly can take a look at them, collectively or singly. And you may suppress them, collectively or singly.

Ermistatas bought a fourth antenna.

8. Importing of SAS and SPSS datasets

New command import sas imports .sas7bdat knowledge recordsdata and .sas7bcat value-label recordsdata.

New command import spss imports IBM SPSS model 16 or increased .sav and .zsav recordsdata.

I like to recommend utilizing them from their dialog packing containers. You possibly can preview the information and choose the variables and observations you wish to import.

9. Versatile nonparametric collection regression

New command npregress collection matches fashions like

y = g(x1, x2, x3) + ε

No functional-form restrictions are positioned on g(), however you possibly can impose separability restrictions. The brand new command can match

y = g1(x1) + g2(x2, x3) + ε

y = g1(x1, x2) + g3(x3) + ε

y = g1(x1, x3) + g2(x2) + ε

and even match

y = b1x1 + g2(x2, x3) + ε

y = b1x1 + b2x2 + g3(x3) + ε

I discussed that lasso can carry out inference in fashions like

. dsregress y x1, controls(x2-x999)

If you understand that variables x12, x19, and x122 seem within the mannequin, however have no idea the useful kind, you possibly can use npregress collection to acquire inference. The command

. npregress collection y x12 x19 x122, asis(x1)

matches

y = b1x1 + g2(x12, x19, x122) + ε

and, amongst different statistics, reviews the coefficient and normal error of b1.

10. A number of datasets in reminiscence, that means frames

I’m a sucker for knowledge administration instructions. Even so, I don’t assume I’m exaggerating after I say that frames will change the way in which you’re employed. In case you are not , bear with me. I feel I can change your thoughts.

You possibly can have a number of datasets in reminiscence. Every is saved in a named body. At any on the spot, one of many frames is the present body. Most Stata instructions function on the information within the present body. It’s the instructions that work throughout frames that can change the way in which you’re employed, however earlier than you should use them, it’s important to discover ways to use frames. So right here’s a little bit of me utilizing frames:

. use individuals

. body create counties

. body counties: use counties

. tabulate cntyid

. body counties: tabulate cntyid

Effectively, I’m pondering at this level, it seems I may merge individuals.dta with counties.dta, besides I’m not fascinated by merging them. I’m fascinated by linking them.

. frlink m:1 cntyid, body(counties)

Linking is body’s equal of merge. It doesn’t change both dataset besides so as to add one variable to the information within the present body. New variable counties is created on this case. If I have been to drop the variable, I might remove the hyperlink, however I’m not going to try this. I’m curious whether or not the counties during which folks reside in individuals.dta have been all present in counties.dta. I can discover out by typing

. rely if counties==.

If 1,000 have been reported, I might now drop counties, and it could be as if I had by no means linked the 2 frames.

Let’s assume rely reported 0. Or 4, which is a sufficiently small quantity that I don’t take care of this demonstration. Now watch this:

. generate relinc = revenue / frget(counties, medinc)

I simply calculated every particular person’s revenue relative to the median revenue within the county during which she or he resides, and median revenue was within the counties dataset, not the individuals dataset!

Subsequent, I’ll copy to the present body all of the variables in counties that begin with pop. The command that does this, frget, will use the hyperlink and duplicate the suitable observations.

. frget pop*, from(counties)

. describe pop*

. generate ln_pop18plus = ln(pop18plus)

. generate ln_income = ln(revenue)

. correlate ln_income ln_pop18plus

I hope I’ve satisfied you that frames are of curiosity. If not, that is solely one of many 5 methods frames will change how you’re employed with Stata. Perhaps one of many different 4 methods will persuade you. Go to the overview of frames web page at stata.com.

11. Pattern-size evaluation for confidence intervals

The objective is to optimally allocate research sources when CIs are for use for inference or, mentioned in a different way, to estimate the pattern measurement required to attain the specified precision of a CI in a deliberate research. One imply, two impartial means, or two paired means. Or one variance.

12. Nonlinear DSGE fashions

DSGE stands for Dynamic Stochastic Common Equilibrium. Stata beforehand match linear DSGEs. Now it may match nonlinear ones too.

I do know this both pursuits you or doesn’t, and if it doesn’t, there shall be no altering your thoughts. It pursuits me, and what makes the brand new function spectacular is how straightforward fashions are to specify and the way readable the code is afterwards. You could possibly nearly train from it. If this pursuits you, click on by means of.

13. A number of-group IRT

IRT (Merchandise Response Concept) is concerning the relationship between latent traits and the devices designed to measure them. An IRT evaluation could be about scholastic means (the latent trait) and a school admission take a look at (the instrument).

Stata 16’s new IRT options produce outcomes for knowledge containing totally different teams of individuals. Do devices measure latent traits in the identical manner for various populations?

Right here is an instance. Do college students in city and rural faculties carry out in a different way on a take a look at meant to measure mathematical means? Utilizing Stata 16, you possibly can match a 2-parameter logistic mannequin evaluating the teams by typing

. irt 2pl item1-item10, group(urbanrural)

What’s new is the group() choice.

Does an instrument measuring despair carry out the identical at this time because it did 5 years in the past? You possibly can match a graded-response mannequin that compares the teams by typing

. irt grm item-item10, group(timecategory)

And IRT’s postestimation graphs have been up to date to disclose the variations amongst teams when a group() mannequin has been match.

The examples I discussed each involved two teams, however IRT can deal with any variety of them.

14. Panel-data Heckman-selection fashions

Heckman choice adjusts for bias when some outcomes are lacking not at random.

The basic instance is economists’ modeling of wages. Wages are noticed solely for individuals who work, and whether or not you’re employed is unlikely to be random. Give it some thought. Ought to I work or go to highschool? Ought to I work or reside off my meager financial savings? Ought to I work or retire? Few folks can be keen to make these selections by flipping a coin.

In the event you fear about such issues and are utilizing panel knowledge, the brand new xtheckman command is the answer.

15-21. Seven extra new options

I’ll summarize the final seven options briefly. My briefness makes them no much less essential, particularly in the event that they curiosity you.

15. NLMEs with lags: multiple-dose pharmacokinetic fashions and extra can now be match by Stata’s menl command for becoming nonlinear mixed-effects regression. This contains becoming multiple-dose fashions.

16. Heteroskedastic ordered probit joins the ordered probit fashions that Stata already may match.

17. Graph sizes in inches, centimeters, and printer factors can now be specified. Specify 1in, 1.4cm, or 12pt.

18. Programmers: Mata’s new Quadrature class numerically integrates y = f(x) over the interval a to b, the place a could also be -∞ or finite and b could also be finite or +∞.

19. Programmers: Mata’s new Linear programming class solves linear applications utilizing an interior-point methodology. It minimizes or maximizes a linear goal operate topic to linear constraints (equality and inequality) and boundary circumstances.

20. Do-file Editor: Autocompletion and extra. The editor now gives syntax highlighting for Python and Markdown. And it autocompletes Stata instructions, quotes, parentheses, braces, and brackets. Final however not least, areas in addition to tabs can be utilized for indentation.

21. Stata for Mac: Darkish Mode and tabbed home windows. Darkish mode is a shade scheme that darkens background home windows and controls in order that they don’t trigger eye pressure or distract from what you’re engaged on. Stata now helps it. In the meantime, tabbed home windows preserve display actual property. Stata has a number of home windows. Except the Outcomes window, they arrive and go as they’re wanted. Now you possibly can mix all or some into one. Click on the tab, change the window.

 

That’s it

The highlights are 58% of what’s new in Stata 16, measured by the variety of textual content traces required to explain them. Here’s a sampling of what else is new.

  • ranksum has new choice precise to specify that precise p-values be computed for the Wilcoxon rank-sum take a look at.
  • New setting set iterlog controls whether or not estimation instructions show iteration logs.
  • menl has new choice lrtest that reviews a likelihood-ratio take a look at evaluating the nonlinear mixed-effects mannequin with the mannequin match by strange nonlinear regression.
  • The bayes: prefix command now helps the brand new hetoprobit command so to match Bayesian heteroskedastic ordered probits.
  • The svy: prefix works with extra estimation instructions, specifically, present command hetoprobit and new instructions cmmixlogit and cmxtmixlogit.
  • New command export sasxport8 exports datasets to SAS XPORT Model 8 Transport format.
  • New command splitsample splits knowledge into random samples. It might probably create easy random samples, clustered samples, and balanced random samples. Stability splitting can be utilized for matched-treatment task.

I may go on. Sort assist whatsnew15to16 if you get your copy of Stata 16 to seek out out all that’s new.

I hope you get pleasure from Stata 16.



Related Articles

Latest Articles