Monday, October 27, 2025

Dempster’s evaluation and donkeys – Statisfaction


This put up is about estimating the parameter of a Bernoulli distribution from observations, within the “Dempster” or “DempsterShafer” method, which is a generalization of Bayesian inference. I’ll recall what this strategy is about, and describe a Gibbs sampler to carry out the computation. Intriguingly the related Markov chain occurs to be equal to the so-called “donkey stroll” (not this one), as identified by Guanyang Wang and Persi Diaconis.

Denote the observations, or “coin flips”, by (x_1,ldots,x_N)in{0,1}^N. The mannequin stipulates that x_n = 1(u_n < theta), the place u_n are unbiased Uniform(0,1) variables, and theta in (0,1) is the parameter to be estimated. That’s, x_n = 1 if some uniform lands beneath theta, which certainly happens with likelihood theta, in any other case x_n = 0. We’ll name the uniform variables “auxiliary”, and denote by N_0,N_1 the counts of “0” and “1”, with N_0 + N_1 = N.

In a Bayesian strategy, we’d specify a previous distribution on the parameter; for instance a Beta prior would result in a Beta posterior on theta. The auxiliary variables would play no position; aside maybe in Approximate Bayesian Computation. In Dempster’s strategy, we are able to keep away from the specification of a previous, and as an alternative, and “switch” the randomness from the auxiliary variables to a distribution of subsets of parameters; see ref [1] beneath. Let’s see how this works.

Given observations (x_1,ldots,x_N)in{0,1}^N, there are auxiliary variables (u_1,ldots,u_N) which are suitable with the observations, within the sense that there exists some theta such that forall nin{1,ldots,n} ; x_n = 1(u_n < theta). And there are different configurations of (u_1,ldots,u_N) that aren’t suitable. If we denote by mathcal{I}_0 the indices nin {1,ldots,N} akin to an noticed x_n = 0, and likewise for mathcal{I}_1, we are able to see that there exists some “possible” theta solely when max_{ninmathcal{I}_1} u_n < min_{ninmathcal{I}_0} u_n. In that case the possible theta are within the interval (max_{ninmathcal{I}_1} u_n, min_{ninmathcal{I}_0} u_n). The next diagram illustrates this with N_0  = 2, N_1 = 3.

How will we acquire the distribution of those units mathcal{F}(u), beneath the Uniform distribution of uin [0,1]^N and conditioning on mathcal{F}(u)neq emptyset? We may draw N uniforms, sorted in growing order, and report the interval between the N_1-th and the (N_1+1)-th values (Part 4 in [1]). However that might be no enjoyable, so allow us to think about a Gibbs sampler as an alternative (taken from [4]). We’ll pattern the auxiliary variables uniformly, conditional upon mathcal{F}(u)neq emptyset, and we’ll proceed by sampling the variables u_n listed by mathcal{I}_0 given the variables listed by mathcal{I}_1, and vice versa. The joint distribution of all of the variables has density proportional to

1(forall n ; u_n in (0,1) quad text{and} quad max_{ninmathcal{I}_1} u_n < min_{ninmathcal{I}_0} u_n).

From this joint density we are able to work out the conditionals. We are able to then categorical the Gibbs updates by way of the endpoints of the interval mathcal{F}(u). Particularly, writing the endpoints at iteration t as (Y^{(t)},Z^{(t)}), the Gibbs sampler is equal to:

  • Sampling Y^{(t)} = Z^{(t-1)} times text{Beta}(N_1,1).
  • Sampling Z^{(t)} = Y^{(t)} + (1-Y^{(t)}) times text{Beta}(1,N_2).

That is precisely the mannequin of Buridan’s donkey in refs [2,3] beneath. The concept is that the donkey, being each hungry and thirsty however not having the ability to select between the water and the hay, takes a step in both course alternatively.

The donkey stroll has been generalized to greater dimensions in [3], and in a way our Gibbs sampler in [4] can be a generalization to greater dimensions… it’s not clear whether or not these two generalizations are the identical or not. So I’ll depart that dialogue for one more day.

A couple of remarks to wrap up.

  • It’s a function of Dempster’s strategy that it yields random subsets of parameters relatively than singletons as customary Bayesian evaluation. Dempster’s strategy is a generalization of Bayes: if we specify a regular prior and apply “Dempster’s rule of mixture” we retrieve customary Bayes.
  • What will we do with these random intervals mathcal{F}(u), as soon as we acquire them? We are able to compute the proportion of them that intersects/is contained in a set of curiosity, for instance the set {theta > 1/2}, and these proportions are reworked into measures of settlement, disagreement or indeterminacy relating to the set of curiosity, versus posterior chances in customary Bayes.
  • Dempster’s estimates rely upon the selection of sampling mechanism and related auxiliary variables, which is matter of many discussions in that literature.
  • In a earlier put up I described an equivalence between the sampling mechanism thought of in [1] when there are greater than two classes, and the Gumbel-max trick… plainly the Dempster’s strategy has numerous intriguing connections.

References:

  • [1] Arthur P. Dempster, New Strategies for Reasoning In direction of Posterior Distributions Primarily based on Smple Knowledge, 1966. [link]
  • [2] Jordan Stoyanov & Christo Pirinsky, Random motions, lessons of ergodic Markov chains and beta distributions, 2000. [link]
  • [3] Gérard Letac, Donkey stroll and Dirichlet distributions, 2002. [link]
  • [4] Pierre E Jacob, Ruobin Gong, Paul T. Edlefsen & Arthur P. Dempster, A Gibbs sampler for a category of random convex polytopes, 2021. [link]

Related Articles

Latest Articles