Wednesday, January 28, 2026

Gelman–Rubin convergence diagnostic utilizing a number of chains


As of Stata 16, see [BAYES] bayesstats grubin and Bayesian evaluation: Gelman-Rubin convergence diagnostic.

The unique weblog posted Might 26, 2016, omitted choice initrandom from the bayesmh command. The code and the textual content of the weblog entry had been up to date on August 9, 2018, to mirror this.

Overview

MCMC algorithms used for simulating posterior distributions are indispensable instruments in Bayesian evaluation. A significant consideration in MCMC simulations is that of convergence. Has the simulated Markov chain absolutely explored the goal posterior distribution to date, or do we want longer simulations? A standard method in assessing MCMC convergence is predicated on working and analyzing the distinction between a number of chains.

For a given Bayesian mannequin, bayesmh is able to producing a number of Markov chains with randomly dispersed preliminary values by utilizing the initrandom choice, accessible as of the replace on 19 Might 2016. On this put up, I reveal the Gelman–Rubin diagnostic as a extra formal take a look at for convergence utilizing a number of chains. For graphical diagnostics, see Graphical diagnostics utilizing a number of chains in [BAYES] bayesmh for extra particulars. To compute the Gelman–Rubin diagnostic, I exploit an unofficial command, grubin, which might be put in by typing the next in Stata:

. web set up grubin, from("http://www.stata.com/customers/nbalov")

To see the assistance file, kind

. assist grubin

The Gelman–Rubin convergence diagnostic

The Gelman–Rubin diagnostic evaluates MCMC convergence by analyzing the distinction between a number of Markov chains. The convergence is assessed by evaluating the estimated between-chains and within-chain variances for every mannequin parameter. Giant variations between these variances point out nonconvergence. See Gelman and Rubin (1992) and Brooks and Gelman (1997) for the detailed description of the tactic.

Suppose we have now (M) chains, every of size (N), though the chains could also be of various lengths. The identical-length assumption simplifies the formulation and is used for comfort. For a mannequin parameter (theta), let ({theta_{mt}}_{t=1}^{N}) be the (m)th simulated chain, (m=1,dots,M). Let (hattheta_m) and (hatsigma_m^2) be the pattern posterior imply and variance of the (m)th chain, and let the general pattern posterior imply be (hattheta = (1/M)sum_{m=1}^Mhattheta_m). The between-chains and within-chain variances are given by
start{align}
B &= frac{N}{M-1} sum_{m=1}^M (hattheta_m – hattheta)^2
W &= frac{1}{M} sum_{m=1}^M hatsigma_m^2
finish{align}
Beneath sure stationarity circumstances, the pooled variance
$$
widehat V = frac{N-1}{N} W + frac{M+1}{MN} B
$$
is an unbiased estimator of the marginal posterior variance of (theta) (Gelman and Rubin 1992). The potential scale discount issue (PSRF) is outlined to be the ratio of (widehat V) and (W). If the (M) chains have converged to the goal posterior distribution, then PSRF must be near 1. Brooks and Gelman (1997) corrected the unique PSRF by accounting for sampling variability as follows:
$$
R_c = sqrt{frac{hat{d}+3}{hat{d}+1}frac{widehat V}{W}}
$$
the place (hat d) is the levels of freedom estimate of a (t) distribution.

PSRF estimates the potential lower within the between-chains variability (B) with respect to the within-chain variability (W). If (R_c) is massive, then longer simulation sequences are anticipated to both lower (B) or improve (W) as a result of the simulations haven’t but explored the total posterior distribution. As Brooks and Gelman (1997) have urged, if (R_c < 1.2) for all mannequin parameters, one might be pretty assured that convergence has been reached. In any other case, longer chains or different means for bettering the convergence could also be wanted. Much more reassuring is to use the extra stringent situation (R_c < 1.1), which is the criterion I exploit within the examples under.

Beneath the normality assumption on the marginal posterior distribution of (theta) and stationarity assumptions on the chain, the ratio (B/W) follows an F distribution with (M-1) numerator levels of freedom and (nu) denominator levels of freedom. An higher confidence restrict (R_u(alpha)) for (R_c) might be derived (see part 3.7 in Gelman and Rubin [1992], the place (nu) can be outlined):
$$
R_u = sqrt{frac{hat{d}+3}{hat{d}+1}bigg{(}frac{N-1}{N} W + frac{M+1}{M} q_{1-alpha/2}bigg{)}}
$$
the place (alpha) is a prespecified confidence stage and (q_{1-alpha/2}) is the ((1-alpha/2))th quantile of the aforementioned F distribution. We’re solely within the higher confidence restrict as a result of we’re involved with massive PSRF values. By evaluating (R_c) to (R_u), one can carry out a proper take a look at for convergence.

The Stata program grubin calculates and stories the Gelman–Rubin diagnostic for some or all mannequin parameters. This system makes use of beforehand saved or saved estimation outcomes of bayesmh. You specify estimation outcomes utilizing both the choice estnames() or the choice estfiles(). By default, grubin computes the Gelman–Rubin diagnostic for all mannequin parameters. Alternatively, you could specify a subset of mannequin parameters or substitutable expressions containing mannequin parameters following the parameter specification of bayesstats abstract. You might also specify a confidence stage for calculating the higher confidence restrict of PSRF by utilizing the stage() choice. grubin is an r-class command that stories the (R_c) and (R_u) values and shops them within the matrices r(Rc) and r(Ru), respectively.

Instance

To reveal the grubin program, I take into account a Bayesian linear mannequin utilized to the well-known auto dataset.

. webuse auto
(1978 Vehicle Information)

I regress the mpg variable on the weight variable by assuming a standard probability mannequin with an unknown variance. My Bayesian mannequin thus has three parameters: {mpg:weight}, {mpg:_cons}, and {sigma2}. I specify a weakly informative prior, N(0, 100), for the regression coefficients, and I specify the prior InvGamma(10, 10) for the variance parameter. I block the regression parameters {mpg:} individually to extend sampling effectivity.

Within the first set of runs, I simulate 3 chains of size 25. I intentionally selected a small MCMC dimension hoping to reveal lack of convergence. I initialize the three chains randomly by specifying the initrandom choice of bayesmh. The simulation datasets are saved as sim1.dta, sim2.dta, and sim3.dta.

. set seed 14

. forvalues nchain = 1/3 {
  2.     quietly bayesmh mpg weight,    
>         probability(regular({sigma2}))     
>         prior({mpg:}, regular(0, 100)) 
>         prior({sigma2},  igamma(10, 10)) 
>         block({mpg:}) initrandom
>         mcmcsize(25) saving(sim`nchain')
  3.     quietly estimates retailer chain`nchain'
  4. }

The Gelman–Rubin diagnostic assumes normality of the marginal posterior distributions. To enhance the traditional approximation, it’s endorsed to rework parameters that aren’t supported on the entire actual line. As a result of the variance parameter {sigma2} is at all times constructive, I apply the log transformation to normalize its marginal distribution when computing the Gelman–Rubin diagnostic. The remodeled parameter is labeled as lnvar.

I now use grubin to calculate and report the Gelman–Rubin diagnostics. I exploit the default confidence stage of 95% for the higher confidence restrict.

. grubin {mpg:weight} {mpg:_cons} (lnvar:log({sigma2})),
>         estnames(chain1 chain2 chain3)

Gelman-Rubin convergence diagnostic

MCMC pattern dimension =          25
Variety of chains =           3

-----------------------------------
             |        Rc     95% Ru
-------------+---------------------
mpg          |
      weight |  1.007256   1.090938
       _cons |  1.030188   1.097078
-------------+---------------------
       lnvar |  1.221488   1.145878
-----------------------------------

The primary column within the output exhibits the PSRF estimates (R_c) and the second column exhibits the higher confidence limits (R_u) for every mannequin parameter. We see that though the (R_c)’s of {mpg:weight} and {mpg:_cons} are under 1.1, the (R_c) of {sigma2} is sort of massive at 1.22. Furthermore, all (R_c) values exceed their corresponding higher confidence limits on the 95% confidence stage. Clearly, quick Markov chains of size 25 usually are not ample for reaching convergence for this mannequin.

Within the subsequent collection of simulations, I improve the MCMC dimension to 50. This time I count on to acquire converging chains.

. set seed 14

. forvalues nchain = 1/3 {
  2.     quietly bayesmh mpg weight,    
>         probability(regular({sigma2}))     
>         prior({mpg:}, regular(0, 100)) 
>         prior({sigma2},  igamma(10, 10)) 
>         block({mpg:}) initrandom
>         mcmcsize(50) saving(sim`nchain', change)
  3.     quietly estimates retailer chain`nchain'
  4. }

I name grubin once more with a confidence stage of 95%.

. grubin {mpg:weight} {mpg:_cons} (lnvar:log({sigma2})), 
>         estnames(chain1 chain2 chain3)

Gelman-Rubin convergence diagnostic

MCMC pattern dimension =          50
Variety of chains =           3

-----------------------------------
             |        Rc     95% Ru
-------------+---------------------
mpg          |
      weight |  1.045376   1.058433
       _cons |  1.083469    1.05792
-------------+---------------------
       lnvar |  1.006594   1.056714
-----------------------------------

All three (R_c) values are under 1.1, however they nonetheless usually are not fairly inside the higher confidence restrict (R_u). This doesn’t essentially imply that the chains haven’t converged, as a result of (R_u) is computed primarily based on the approximation of the sampling distribution of the (R_c) statistic by an F distribution that will not at all times maintain. For such low (R_c) values—all under 1.09—I’ve little motive to suspect nonconvergence. However, I run a 3rd set of simulations utilizing an extended chain and a extra environment friendly simulation.

Within the final set of simulations, I additional improve the MCMC dimension to 100.

. set seed 14

. forvalues nchain = 1/3 {
  2.     quietly bayesmh mpg weight,    
>         probability(regular({sigma2}))     
>         prior({mpg:}, regular(0, 100)) 
>         prior({sigma2},  igamma(10, 10)) 
>         block({mpg:}) initrandom
>         mcmcsize(100) saving(sim`nchain', change)
  3.     quietly estimates retailer chain`nchain'
  4. }

. grubin {mpg:weight} {mpg:_cons} (lnvar:log({sigma2})), 
>         estnames(chain1 chain2 chain3)

Gelman-Rubin convergence diagnostic

MCMC pattern dimension =         100
Variety of chains =           3

-----------------------------------
             |        Rc     95% Ru
-------------+---------------------
mpg          |
      weight |  1.019446   1.031024
       _cons |  1.003891    1.02604
-------------+---------------------
       lnvar |  .9993561   1.020912
-----------------------------------

This time, all of the (R_c) values are nicely under 1.01 and, furthermore, under their corresponding higher confidence limits. We will conclude that every one chains have converged.

References

Brooks, S. P., and A. Gelman. 1997. Normal Strategies for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics 7: 434–455.

Gelman, A., and D. B. Rubin. 1992. Inference from Iterative Simulation Utilizing A number of Sequences. Statistical Science 7: 457–511.



Related Articles

Latest Articles