Thursday, March 19, 2026

Stata 14 introduced, ships – The Stata Weblog


We’ve simply introduced the discharge of Stata 14. Stata 14 ships and downloads beginning now.

Stata 14 is now out there. You heard it right here first.

There’s a protracted custom that Statalisters hear about Stata’s new releases first. The brand new discussion board is celebrating its first birthday, however it’s a continuation of the outdated Statalist, so the custom continues, however up to date for the fashionable world, the place every thing occurs extra rapidly. You’re listening to about Stata 14 roughly a microsecond earlier than the remainder of the world. Traditions are essential.

Right here’s yet one more instance of every thing taking place sooner within the trendy world. Relatively than the announcement previous delivery by a couple of weeks as in earlier releases, Stata 14 ships and downloads beginning now. Or fairly, a microsecond from now.

Some issues from the previous are price preserving, nevertheless, and one is that I get to jot down concerning the new launch in my very own idiosyncratic approach. So let me get the advertising and marketing stuff out of the way in which after which I can inform you about a couple of issues that particularly curiosity me and would possibly curiosity you.

MARKETING BEGINS.

Right here’s a partial record of what’s new, a.ok.a. the highlights:

  • Unicode
  • Greater than 2 billion observations (Stata/MP)
  • Bayesian evaluation
  • IRT (Merchandise Response Principle)
  • Panel-data survival fashions
  • Remedy results
    • Remedy results for survival fashions
    • Endogenous therapies
    • Likelihood weights
    • Stability evaluation
  • Multilevel mixed-effects survival fashions
  • Small-sample inference for multilevel fashions
  • SEM (structural equation modeling)
    • Survival fashions
    • Satorra-Bentler scaled chi-squared take a look at
    • Survey information
    • Multilevel weights
  • Energy and pattern dimension
    • Survival fashions
    • Contingency (epidemiological) tables
  • Markov-switching regression fashions
  • Assessments for structural breaks in time-series
  • Fractional consequence regression fashions
  • Hurdle fashions
  • Censored Poisson regression
  • Survey help & multilevel weights for multilevel fashions
  • New random-number mills
  • Estimated marginal means and marginal results
    • Tables for a number of outcomes and ranges
    • Integration over unobserved and latent variables
  • ICD-10
  • Stata in Spanish and in Japanese

The above record will not be full; it lists about 30% of what’s new.

For all the main points about Stata 14, together with buy and replace info, and hyperlinks to distributors exterior of the US, go to stata.com/stata14.

In case you are exterior of the US, you may order out of your licensed Stata distributor. They’ll provide codes as a way to entry and obtain from stata.com.

MARKETING ENDS.

I need to write about three of the brand new options ‒ Unicode, greater than 2-billion observations, and Bayesian evaluation.

Unicode is the fashionable approach that computer systems encode characters such because the letters in what you are actually studying. Unicode encodes all of the world’s characters, that means I can write Hiya, Здравствуйте, こんにちは, and much extra apart from. Effectively, the discussion board software program is trendy and I at all times might write these phrases right here. Now I can write them in Stata, too.

For many who care, Stata makes use of Unicode’s UTF-8 encoding.

Anyway, you should utilize Unicode characters in your information, in fact; in your variable labels, in fact; and in your worth labels, in fact. What you won’t count on is that you should utilize Unicode in your variable names, macro names, and in every single place else Stata needs a reputation or identifier.

Right here’s the auto information in Japanese:

Your use of Unicode will not be as excessive because the above. It may be sufficient simply to make tables and graphs labeled in languages apart from English. If that’s the case, simply set the variable labels and worth labels. It doesn’t matter whether or not the variables are named übersetzung and kofferraum or gear_ratio and trunkspace or 変速比 and トランク.

I need to remind English audio system that Unicode contains mathematical symbols. You should use them in titles, axis labels, and the like.

Few good issues come with out price. When you’ve got been utilizing Prolonged ASCII to avoid Stata’s plain ASCII limitations, these information must be translated to Unicode if the strings in them are to show accurately in Stata 14. This contains .dta information, do-files, ado-files, assist information, and the like. It’s simpler to do than you would possibly count on. A brand new unicode analyze command will inform you whether or not you may have information that want fixing and, in that case, the brand new unicode translate command will repair them for you. It’s virtually as simple as typing

. unicode translate *

This command interprets your information and that has received to concern you. What if it mistranslates them? What if the ability fails? Calm down. unicode translate makes backups of the originals, and it retains the backups till you delete them, which it’s a must to do by typing

. unicode erasebackups, badidea

Sure, the choice actually is known as badidea and it isn’t optionally available. One other unicode command can restore the backups.

The troublesome a part of translating your current information will not be performing the interpretation, it’s figuring out which Prolonged ASCII encoding your information used in order that the interpretation will be carried out. We have now recommendation on that within the assist information however, even so, a few of you’ll solely be capable of slim down the encoding to some decisions. The excellent news is that it’s simple to strive every one. You simply kind

. unicode retranslate *

It received’t take lengthy to determine which encoding works finest.

Stata/MP now lets you course of datasets containing greater than 2.1-billion observations. This sounds thrilling, however I believe it is going to curiosity only some of you. How many people have datasets with greater than 2.1-billion observations? And even in the event you do, you will want a pc with numerous reminiscence. This function is beneficial in case you have entry to a 512-gigabyte, 1-terabyte, or 1.5-terabyte pc. With smaller computer systems, you might be unlikely to have room for two.1 billion observations. It’s thrilling that such computer systems can be found.

We elevated the restrict on solely Stata/MP as a result of, to use the upper restrict, you want a number of processors. It’s simple to misjudge how a lot bigger a 2-billion remark dataset is than a 2-million remark one. On my on a regular basis 16 gigabyte pc ‒ which is nothing particular ‒ I simply match a linear regression with six RHS variables on 2-million observations. It ran in 1.2 seconds. I used Stata/SE, and the 1.2 seconds felt quick. So, if my pc had extra reminiscence, how lengthy wouldn’t it take to suit a mannequin on 2-billion observations? 1,200 seconds, which is to say, 20 minutes! You want Stata/MP. Stata/MP4 will cut back that to five minutes. Stata/MP32 will cut back that to 37.5 seconds.

By the way in which, in the event you intend to make use of greater than 2-billion observations, make sure you click on on assist obs_advice that seems within the start-up notes after Stata launches. You’re going to get higher efficiency in the event you set min_memory and segmentsize to bigger values. We inform you what values to set.

There’s fairly a great dialogue about coping with greater than 2-billion observations at stata.com/stata14/huge-datasets.

After that, it’s statistics, statistics, statistics.

Which new statistics will curiosity you clearly relies on your area. We’ve gone deeper into numerous fields. Remedy results for survival fashions is only one instance. Multilevel survival fashions is one other. Markov-switching fashions is yet one more. Effectively, you may learn the record above.

Two of the brand new statistical options are price mentioning, nevertheless, as a result of they merely weren’t there beforehand. They’re Bayesian evaluation and IRT fashions, that are admittedly two very various things.

IRT is a spotlight of the discharge and for a few of it you’ll be the spotlight, so I point out it, and I’ll simply inform you to see stata.com/stata14/irt for extra info.

Bayesian evaluation is the opposite spotlight so far as I’m involved, and it’ll curiosity plenty of you as a result of it cuts throughout fields. Lots of you might be already educated about this and I can simply hear you asking, “Does Stata embrace …?” So right here’s the high-speed abstract:

Stata suits continuous-, binary-, ordinal-, and count-outcome fashions. And linear and nonlinear fashions. And generalized nonlinear fashions. Univariate, multivariate, and multiple-equation. It supplies 10 chance fashions and 18 prior distributions. It additionally permits for user-defined likelihoods mixed with built-in priors, built-in likelihoods mixed with user-defined priors, and a roll-your-own programming method to calculate the posterior density instantly. MCMC strategies are offered, together with Adaptive Metropolis-Hastings (MH), Adaptive MH with Gibbs updates, and full Gibbs sampling for sure likelihoods and priors.

It’s additionally simple to make use of and that’s saying one thing.

There’s an awesome instance of the brand new Bayes options in The Stata Information. I point out this as a result of together with the instance there may be almost a proof of ease of use. The instance seems on the variety of disasters within the British coal mining trade. There was a reasonably abrupt lower within the charge someday between 1887 and 1895, which you see in the event you eyeballed a graph. Within the instance, we mannequin the variety of disasters earlier than the change level as one Poisson course of; the quantity after, as one other Poisson course of; after which we match a mannequin of the 2 Poisson parameters and the date of change. For the change level it makes use of a uniform prior on [1851, 1962] ‒ the vary of the info ‒ and obtains a posterior imply estimate of 1890.4 and a 95% credible interval of [1886, 1896], which agrees with our visible evaluation.

I hope one thing I’ve written above pursuits you. Go to stata.com/stata14 for extra info.



Related Articles

Latest Articles