Friday, February 6, 2026

Learn how to generate random numbers in Stata


Overview

I describe the right way to generate random numbers and talk about some options added in Stata 14. Particularly, Stata 14 features a new default random-number generator (RNG) known as the Mersenne Tornado (Matsumoto and Nishimura 1998), a brand new perform that generates random integers, the power to generate random numbers from an interval, and several other new features that generate random variates from nonuniform distributions.

Random numbers from the uniform distribution

Within the instance under, we use runiform() to create a simulated dataset with 10,000 observations on a (0,1)-uniform variable. Previous to utilizing runiform(), we set the seed in order that the outcomes are reproducible.


. set obs 10000
variety of observations (_N) was 0, now 10,000

. set seed 98034

. generate u1 = runiform()

The imply of a (0,1)-uniform is .5, and the usual deviation is (sqrt{1/12}approx .289). The estimates from the simulated knowledge reported within the output under are near the true values.


 summarize u1

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          u1 |     10,000    .5004244    .2865088   .0000502    .999969

To attract uniform variates over (a, b) as an alternative of over (0, 1), we specify runiform(a, b). Within the instance under, we draw uniform variates over (1, 2) after which estimate the imply and the usual deviation, which we may evaluate with their theoretical values of 1.5 and (sqrt{(1/12)} approx .289).


. generate u2 = runiform(1, 2)

. summarize u2

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          u2 |     10,000    1.495698    .2887136   1.000088   1.999899

To attract integers uniformly over {a, a+1, …, b}, we specify runiformint(a, b). Within the instance under, we draw integers uniformly over {0, 1, …, 100} after which estimate the imply and the usual deviation, which we may evaluate with their theoretical values of fifty and (sqrt{(101^2-1)/12}approx 29.155).


. generate u3 = runiformint(0, 100)

. summarize u3

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
          u3 |     10,000     49.9804    29.19094          0        100

Set the seed and make outcomes reproducible

We use set seed # to acquire the identical random numbers, which makes the next outcomes reproducible. RNGs come from a recursive formulation. The “random” numbers produced are literally deterministic, however they look like random. Setting the seed specifies a beginning place for the recursion, which causes the random numbers to be the identical, as within the instance under.


. drop _all

. set obs 6
variety of observations (_N) was 0, now 6

. set seed 12345

. generate x = runiform()

. set seed 12345

. generate y = runiform()

. listing x y

     +---------------------+
     |        x          y |
     |---------------------|
  1. | .3576297   .3576297 |
  2. | .4004426   .4004426 |
  3. | .6893833   .6893833 |
  4. | .5597356   .5597356 |
  5. | .5744513   .5744513 |
     |---------------------|
  6. | .2076905   .2076905 |
     +---------------------+

Each time Stata is launched, the seed is ready to 123456789.

After producing (N) random numbers, the RNG wraps round and begins producing the identical sequence yet again. (N) is known as the interval of the RNG. Bigger intervals are higher as a result of we get extra random numbers earlier than the sequence wraps. The interval of Mersenne Tornado is (2^{19937}-1), which is large. Giant intervals are vital when performing sophisticated simulation research.

In Stata, the seed is a constructive integer (between 0 and (2^{31}-1)) that Stata maps onto the state of the RNG. The state of an RNG corresponds to a spot within the sequence. The mapping isn’t one to 1 as a result of there are extra states than seeds. If you wish to decide up the place you left off within the sequence, you’ll want to restore the state, as within the instance under.


 drop _all

. set obs 3
variety of observations (_N) was 0, now 3

. set seed 12345

. generate x = runiform()

. native state `c(rngstate)'

. generate y = runiform()

. set rngstate `state'

. generate z = runiform()

. listing

     +--------------------------------+
     |        x          y          z |
     |--------------------------------|
  1. | .3576297   .5597356   .5597356 |
  2. | .4004426   .5744513   .5744513 |
  3. | .6893833   .2076905   .2076905 |
     +--------------------------------+

After dropping the information and setting the variety of observations to three, we use generate to place random variates in x, retailer the state of the RNG within the native macro state, after which put random numbers in y. Subsequent, we use set rngstate to revive the state to what it was earlier than we generated y, after which we generate z. The random numbers in z are the identical as these in y as a result of restoring the state brought about Stata to start out on the similar place within the sequence as earlier than we generated y. See Programming an estimation command in Stata: The place to retailer your stuff for an introduction to native macros.

Random variates from varied distributions

Thus far, we now have talked about producing uniformly distributed random numbers. Stata additionally offers features that generate random numbers from different distributions. The perform names are simple to recollect: the letter r adopted by the title of the distribution. Some frequent examples are rnormal(), rbeta(), and rweibull(). Within the instance under, we draw 5,000 observations from a normal regular distribution and summarize the outcomes.


. drop _all

. set seed 12345

. set obs 5000
variety of observations (_N) was 0, now 5,000

. generate w = rnormal()

. summarize w

    Variable |        Obs        Imply    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
           w |      5,000    .0008946    .9903156  -3.478898   3.653764

The estimated imply and commonplace deviation are near their true values of 0 and 1.

A word on precision

Thus far, we generated random numbers with the default knowledge kind of float. Producing the random numbers with kind double makes ties happen much less incessantly. Ties can nonetheless happen with kind double as a result of the massive interval of Mersenne Tornado exceeds the precison of (2^{-53}), so an extended sufficient sequence of random numbers may have repeated numbers.

Conclusion

On this put up, I confirmed the right way to generate random numbers utilizing random-number features in Stata. I additionally mentioned the right way to make outcomes reproducible by setting the seed. In subsequent posts, I’ll delve into different facets of RNGs, together with strategies to generate random variates from different distributions and in Mata.

Reference

Matsumoto, M., and T. Nishimura. 1998. Mersenne Tornado: A 623-dimensionally equidistributed uniform pseudo-random quantity generator. ACM Transactions on Modeling and Laptop Simulation 8: 3–30.



Related Articles

Latest Articles