Saturday, March 14, 2026
Home Blog Page 103

Gelman–Rubin convergence diagnostic utilizing a number of chains

0


As of Stata 16, see [BAYES] bayesstats grubin and Bayesian evaluation: Gelman-Rubin convergence diagnostic.

The unique weblog posted Might 26, 2016, omitted choice initrandom from the bayesmh command. The code and the textual content of the weblog entry had been up to date on August 9, 2018, to mirror this.

Overview

MCMC algorithms used for simulating posterior distributions are indispensable instruments in Bayesian evaluation. A significant consideration in MCMC simulations is that of convergence. Has the simulated Markov chain absolutely explored the goal posterior distribution to date, or do we want longer simulations? A standard method in assessing MCMC convergence is predicated on working and analyzing the distinction between a number of chains.

For a given Bayesian mannequin, bayesmh is able to producing a number of Markov chains with randomly dispersed preliminary values by utilizing the initrandom choice, accessible as of the replace on 19 Might 2016. On this put up, I reveal the Gelman–Rubin diagnostic as a extra formal take a look at for convergence utilizing a number of chains. For graphical diagnostics, see Graphical diagnostics utilizing a number of chains in [BAYES] bayesmh for extra particulars. To compute the Gelman–Rubin diagnostic, I exploit an unofficial command, grubin, which might be put in by typing the next in Stata:

. web set up grubin, from("http://www.stata.com/customers/nbalov")

To see the assistance file, kind

. assist grubin

The Gelman–Rubin convergence diagnostic

The Gelman–Rubin diagnostic evaluates MCMC convergence by analyzing the distinction between a number of Markov chains. The convergence is assessed by evaluating the estimated between-chains and within-chain variances for every mannequin parameter. Giant variations between these variances point out nonconvergence. See Gelman and Rubin (1992) and Brooks and Gelman (1997) for the detailed description of the tactic.

Suppose we have now (M) chains, every of size (N), though the chains could also be of various lengths. The identical-length assumption simplifies the formulation and is used for comfort. For a mannequin parameter (theta), let ({theta_{mt}}_{t=1}^{N}) be the (m)th simulated chain, (m=1,dots,M). Let (hattheta_m) and (hatsigma_m^2) be the pattern posterior imply and variance of the (m)th chain, and let the general pattern posterior imply be (hattheta = (1/M)sum_{m=1}^Mhattheta_m). The between-chains and within-chain variances are given by
start{align}
B &= frac{N}{M-1} sum_{m=1}^M (hattheta_m – hattheta)^2
W &= frac{1}{M} sum_{m=1}^M hatsigma_m^2
finish{align}
Beneath sure stationarity circumstances, the pooled variance
$$
widehat V = frac{N-1}{N} W + frac{M+1}{MN} B
$$
is an unbiased estimator of the marginal posterior variance of (theta) (Gelman and Rubin 1992). The potential scale discount issue (PSRF) is outlined to be the ratio of (widehat V) and (W). If the (M) chains have converged to the goal posterior distribution, then PSRF must be near 1. Brooks and Gelman (1997) corrected the unique PSRF by accounting for sampling variability as follows:
$$
R_c = sqrt{frac{hat{d}+3}{hat{d}+1}frac{widehat V}{W}}
$$
the place (hat d) is the levels of freedom estimate of a (t) distribution.

PSRF estimates the potential lower within the between-chains variability (B) with respect to the within-chain variability (W). If (R_c) is massive, then longer simulation sequences are anticipated to both lower (B) or improve (W) as a result of the simulations haven’t but explored the total posterior distribution. As Brooks and Gelman (1997) have urged, if (R_c < 1.2) for all mannequin parameters, one might be pretty assured that convergence has been reached. In any other case, longer chains or different means for bettering the convergence could also be wanted. Much more reassuring is to use the extra stringent situation (R_c < 1.1), which is the criterion I exploit within the examples under.

Beneath the normality assumption on the marginal posterior distribution of (theta) and stationarity assumptions on the chain, the ratio (B/W) follows an F distribution with (M-1) numerator levels of freedom and (nu) denominator levels of freedom. An higher confidence restrict (R_u(alpha)) for (R_c) might be derived (see part 3.7 in Gelman and Rubin [1992], the place (nu) can be outlined):
$$
R_u = sqrt{frac{hat{d}+3}{hat{d}+1}bigg{(}frac{N-1}{N} W + frac{M+1}{M} q_{1-alpha/2}bigg{)}}
$$
the place (alpha) is a prespecified confidence stage and (q_{1-alpha/2}) is the ((1-alpha/2))th quantile of the aforementioned F distribution. We’re solely within the higher confidence restrict as a result of we’re involved with massive PSRF values. By evaluating (R_c) to (R_u), one can carry out a proper take a look at for convergence.

The Stata program grubin calculates and stories the Gelman–Rubin diagnostic for some or all mannequin parameters. This system makes use of beforehand saved or saved estimation outcomes of bayesmh. You specify estimation outcomes utilizing both the choice estnames() or the choice estfiles(). By default, grubin computes the Gelman–Rubin diagnostic for all mannequin parameters. Alternatively, you could specify a subset of mannequin parameters or substitutable expressions containing mannequin parameters following the parameter specification of bayesstats abstract. You might also specify a confidence stage for calculating the higher confidence restrict of PSRF by utilizing the stage() choice. grubin is an r-class command that stories the (R_c) and (R_u) values and shops them within the matrices r(Rc) and r(Ru), respectively.

Instance

To reveal the grubin program, I take into account a Bayesian linear mannequin utilized to the well-known auto dataset.

. webuse auto
(1978 Vehicle Information)

I regress the mpg variable on the weight variable by assuming a standard probability mannequin with an unknown variance. My Bayesian mannequin thus has three parameters: {mpg:weight}, {mpg:_cons}, and {sigma2}. I specify a weakly informative prior, N(0, 100), for the regression coefficients, and I specify the prior InvGamma(10, 10) for the variance parameter. I block the regression parameters {mpg:} individually to extend sampling effectivity.

Within the first set of runs, I simulate 3 chains of size 25. I intentionally selected a small MCMC dimension hoping to reveal lack of convergence. I initialize the three chains randomly by specifying the initrandom choice of bayesmh. The simulation datasets are saved as sim1.dta, sim2.dta, and sim3.dta.

. set seed 14

. forvalues nchain = 1/3 {
  2.     quietly bayesmh mpg weight,    
>         probability(regular({sigma2}))     
>         prior({mpg:}, regular(0, 100)) 
>         prior({sigma2},  igamma(10, 10)) 
>         block({mpg:}) initrandom
>         mcmcsize(25) saving(sim`nchain')
  3.     quietly estimates retailer chain`nchain'
  4. }

The Gelman–Rubin diagnostic assumes normality of the marginal posterior distributions. To enhance the traditional approximation, it’s endorsed to rework parameters that aren’t supported on the entire actual line. As a result of the variance parameter {sigma2} is at all times constructive, I apply the log transformation to normalize its marginal distribution when computing the Gelman–Rubin diagnostic. The remodeled parameter is labeled as lnvar.

I now use grubin to calculate and report the Gelman–Rubin diagnostics. I exploit the default confidence stage of 95% for the higher confidence restrict.

. grubin {mpg:weight} {mpg:_cons} (lnvar:log({sigma2})),
>         estnames(chain1 chain2 chain3)

Gelman-Rubin convergence diagnostic

MCMC pattern dimension =          25
Variety of chains =           3

-----------------------------------
             |        Rc     95% Ru
-------------+---------------------
mpg          |
      weight |  1.007256   1.090938
       _cons |  1.030188   1.097078
-------------+---------------------
       lnvar |  1.221488   1.145878
-----------------------------------

The primary column within the output exhibits the PSRF estimates (R_c) and the second column exhibits the higher confidence limits (R_u) for every mannequin parameter. We see that though the (R_c)’s of {mpg:weight} and {mpg:_cons} are under 1.1, the (R_c) of {sigma2} is sort of massive at 1.22. Furthermore, all (R_c) values exceed their corresponding higher confidence limits on the 95% confidence stage. Clearly, quick Markov chains of size 25 usually are not ample for reaching convergence for this mannequin.

Within the subsequent collection of simulations, I improve the MCMC dimension to 50. This time I count on to acquire converging chains.

. set seed 14

. forvalues nchain = 1/3 {
  2.     quietly bayesmh mpg weight,    
>         probability(regular({sigma2}))     
>         prior({mpg:}, regular(0, 100)) 
>         prior({sigma2},  igamma(10, 10)) 
>         block({mpg:}) initrandom
>         mcmcsize(50) saving(sim`nchain', change)
  3.     quietly estimates retailer chain`nchain'
  4. }

I name grubin once more with a confidence stage of 95%.

. grubin {mpg:weight} {mpg:_cons} (lnvar:log({sigma2})), 
>         estnames(chain1 chain2 chain3)

Gelman-Rubin convergence diagnostic

MCMC pattern dimension =          50
Variety of chains =           3

-----------------------------------
             |        Rc     95% Ru
-------------+---------------------
mpg          |
      weight |  1.045376   1.058433
       _cons |  1.083469    1.05792
-------------+---------------------
       lnvar |  1.006594   1.056714
-----------------------------------

All three (R_c) values are under 1.1, however they nonetheless usually are not fairly inside the higher confidence restrict (R_u). This doesn’t essentially imply that the chains haven’t converged, as a result of (R_u) is computed primarily based on the approximation of the sampling distribution of the (R_c) statistic by an F distribution that will not at all times maintain. For such low (R_c) values—all under 1.09—I’ve little motive to suspect nonconvergence. However, I run a 3rd set of simulations utilizing an extended chain and a extra environment friendly simulation.

Within the final set of simulations, I additional improve the MCMC dimension to 100.

. set seed 14

. forvalues nchain = 1/3 {
  2.     quietly bayesmh mpg weight,    
>         probability(regular({sigma2}))     
>         prior({mpg:}, regular(0, 100)) 
>         prior({sigma2},  igamma(10, 10)) 
>         block({mpg:}) initrandom
>         mcmcsize(100) saving(sim`nchain', change)
  3.     quietly estimates retailer chain`nchain'
  4. }

. grubin {mpg:weight} {mpg:_cons} (lnvar:log({sigma2})), 
>         estnames(chain1 chain2 chain3)

Gelman-Rubin convergence diagnostic

MCMC pattern dimension =         100
Variety of chains =           3

-----------------------------------
             |        Rc     95% Ru
-------------+---------------------
mpg          |
      weight |  1.019446   1.031024
       _cons |  1.003891    1.02604
-------------+---------------------
       lnvar |  .9993561   1.020912
-----------------------------------

This time, all of the (R_c) values are nicely under 1.01 and, furthermore, under their corresponding higher confidence limits. We will conclude that every one chains have converged.

References

Brooks, S. P., and A. Gelman. 1997. Normal Strategies for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics 7: 434–455.

Gelman, A., and D. B. Rubin. 1992. Inference from Iterative Simulation Utilizing A number of Sequences. Statistical Science 7: 457–511.



Fashion the New ::search-text and Different Spotlight-y Pseudo-Parts

0


Chrome 144 lately shipped ::search-text, which is now one in every of a number of highlight-related pseudo-elements. This one selects find-in-page textual content, which is the textual content that will get highlighted if you do a Ctrl/Command + F-type seek for one thing on a web page and matches are discovered.

By default, ::search-text matches are yellow whereas the present goal (::search-text:present) is orange, however ::search-text allows us to alter that.

I’ll admit, I hadn’t actually been following these spotlight pseudo-elements. Up till now, I didn’t even know that there was a reputation for them, however I’m glad there may be as a result of that makes it simpler to spherical all of them up and examine them, which is precisely what I’m going to do right here at the moment, because it’s not tremendous apparent what they do primarily based on the identify of the pseudo-element. I’ll additionally clarify why we’re in a position to customise them, and recommend how.

The several types of spotlight pseudo-elements

Pseudo-selector Selects… Notes
::search-text Discover-in-page matches ::search-text:present selects the present goal
::target-text Textual content fragments Textual content fragments enable for programmatic highlighting utilizing URL parameters. When you’re referred to a web site by a search engine, it would use textual content fragments, which is why ::target-text is well confused with ::search-text.
::choice Textual content highlighted utilizing the pointer
::spotlight() Customized highlights as outlined by JavaScript’s Customized Spotlight API
::spelling-error Incorrectly spelled phrases Just about applies to editable content material solely
::grammar-error Incorrect grammar Just about applies to editable content material solely

And let’s not neglect in regards to the HTML factor both, which is what I’m utilizing within the demos under.

What ought to spotlight pseudo-elements appear like?

The query is, if all of them (apart from ::spotlight()) have default styling, why would we have to choose them with pseudo-elements? The reason being accessibility (colour distinction, particularly) and value (emphasis). For instance, if the default yellow background of ::search-text doesn’t distinction effectively sufficient with the textual content colour, or if it doesn’t stand out towards the background of the container, then you definately’ll need to change that.

I’m certain there are a lot of methods to unravel this (I need to hear “problem accepted” within the feedback), however the very best answer that I’ve provide you with makes use of relative colour syntax. I took incorrect turns with each background-clip: textual content and backdrop-filter: invert(1) earlier than realizing that many CSS properties are off-limits in relation to spotlight pseudo-elements:

physique {
  --background: #38003c;
  background: var(--background);

  mark,
  ::choice,
  ::target-text,
  ::search-text {
    /* Match colour to background */
    colour: var(--background);

    /* Convert to RGB then subtract channel worth from channel most (255) */
    background: rgb(from var(--background) calc(255 - r) calc(255 - g) calc(255 - b));
  }
}

Your browser may not assist that but, so right here’s a video that reveals how the highlighted textual content adapts to background colour adjustments.

What’s occurring right here is that I’m changing the container’s background colour to RGB format after which subtracting the worth of every channel (r, g, and b) from the utmost channel worth of 255, inverting every channel and the general colour. This colour is then set because the background colour of the highlighting, making certain that it stands out it doesn’t matter what, and due to the brand new CodePen slideVars, you possibly can fiddle with the demo to see this in motion. You would possibly have the ability to do that with colour codecs apart from RGB, however RGB is the simplest.

In order that covers the usability, however what in regards to the accessibility?

Properly, the highlighting’s textual content colour is identical because the container’s background colour as a result of we all know that it’s the inverse of the highlighting’s background colour. Whereas this doesn’t imply that the 2 colours can have accessible distinction, it appears as if they may more often than not (you need to at all times examine colour distinction utilizing colour distinction instruments, regardless).

When you don’t just like the randomness of inverting colours, that’s comprehensible. You may completely decide colours and write conditional CSS for them manually as an alternative, however discovering accessible colours that stand out towards the completely different backdrops of your design for the entire several types of spotlight pseudo-elements, whereas accounting for different viewing modes equivalent to darkish mode, is a headache. Moreover, I feel sure UI components (e.g., highlights, errors, focus indicators) ought to be ugly. They ought to stand out in a brutalist kind of method and really feel disconnected from the design’s colour palette. They need to demand most consideration by deliberately not becoming in.

Understand that the several types of spotlight pseudo-elements needs to be visually distinctive too, for apparent causes, but additionally in case two differing types overlap one another (e.g., the consumer selects textual content at the moment matched by find-in-page). Subsequently, within the amended code snippet under, mark, ::choice, ::target-text, and ::search-text all have barely completely different backgrounds.

I’ve left mark unchanged, the r worth of ::choice because it was, the g worth of ::target-text because it was, and the b worth of ::search-text because it was, so these final three solely have two channels inverted as an alternative of all three. They’re diversified in colour now (however nonetheless look inverted), and with the addition of an alpha worth at 70% (100% for ::search-text:present), in addition they mix into one another in order that we will see the place every spotlight begins and ends:

physique {
  --background: #38003c;
  background: var(--background);

  mark,
  ::choice,
  ::target-text,
  ::search-text {
    colour: var(--background);
  }

  mark {
    /* Invert all channels */
    background: rgb(from var(--background) calc(255 - r) calc(255 - g) calc(255 - b) / 70%);
  }

  ::choice {
    /* Invert all channels however R */
    background: rgb(from var(--background) r calc(255 - g) calc(255 - b) / 70%);
  }

  ::target-text {
    /* Invert all channels however G */
    background: rgb(from var(--background) calc(255 - r) g calc(255 - b) / 70%);
  }

  ::search-text {
    /* Invert all channels however B */
    background: rgb(from var(--background) calc(255 - r) calc(255 - g) b / 70%);
    
    &:present {
      /* Invert all channels however B, however with out transparency */
      background: rgb(from var(--background) calc(255 - r) calc(255 - g) b / 100%);
    }
  }
}

::spelling-error and ::grammar-error are excluded from all this as a result of they’ve their very own visible affordances (purple underlines and inexperienced underlines respectively, usually contrasted towards the impartial background of an editable factor equivalent to