This put up is about estimating the parameter of a Bernoulli distribution from observations, within the “Dempster” or “Dempster–Shafer” method, which is a generalization of Bayesian inference. I’ll recall what this strategy is about, and describe a Gibbs sampler to carry out the computation. Intriguingly the related Markov chain occurs to be equal to the so-called “donkey stroll” (not this one), as identified by Guanyang Wang and Persi Diaconis.
Denote the observations, or “coin flips”, by . The mannequin stipulates that
, the place
are unbiased Uniform(0,1) variables, and
is the parameter to be estimated. That’s,
if some uniform lands beneath
, which certainly happens with likelihood
, in any other case
. We’ll name the uniform variables “auxiliary”, and denote by
the counts of “0” and “1”, with
.
In a Bayesian strategy, we’d specify a previous distribution on the parameter; for instance a Beta prior would result in a Beta posterior on . The auxiliary variables would play no position; aside maybe in Approximate Bayesian Computation. In Dempster’s strategy, we are able to keep away from the specification of a previous, and as an alternative, and “switch” the randomness from the auxiliary variables to a distribution of subsets of parameters; see ref [1] beneath. Let’s see how this works.
Given observations , there are auxiliary variables
which are suitable with the observations, within the sense that there exists some
such that
. And there are different configurations of
that aren’t suitable. If we denote by
the indices
akin to an noticed
, and likewise for
, we are able to see that there exists some “possible”
solely when
. In that case the possible
are within the interval
. The next diagram illustrates this with
.

How will we acquire the distribution of those units , beneath the Uniform distribution of
and conditioning on
? We may draw
uniforms, sorted in growing order, and report the interval between the
-th and the
-th values (Part 4 in [1]). However that might be no enjoyable, so allow us to think about a Gibbs sampler as an alternative (taken from [4]). We’ll pattern the auxiliary variables uniformly, conditional upon
, and we’ll proceed by sampling the variables
listed by
given the variables listed by
, and vice versa. The joint distribution of all of the variables has density proportional to
From this joint density we are able to work out the conditionals. We are able to then categorical the Gibbs updates by way of the endpoints of the interval . Particularly, writing the endpoints at iteration
as
, the Gibbs sampler is equal to:
- Sampling
.
- Sampling
.
That is precisely the mannequin of Buridan’s donkey in refs [2,3] beneath. The concept is that the donkey, being each hungry and thirsty however not having the ability to select between the water and the hay, takes a step in both course alternatively.
The donkey stroll has been generalized to greater dimensions in [3], and in a way our Gibbs sampler in [4] can be a generalization to greater dimensions… it’s not clear whether or not these two generalizations are the identical or not. So I’ll depart that dialogue for one more day.
A couple of remarks to wrap up.
- It’s a function of Dempster’s strategy that it yields random subsets of parameters relatively than singletons as customary Bayesian evaluation. Dempster’s strategy is a generalization of Bayes: if we specify a regular prior and apply “Dempster’s rule of mixture” we retrieve customary Bayes.
- What will we do with these random intervals
, as soon as we acquire them? We are able to compute the proportion of them that intersects/is contained in a set of curiosity, for instance the set
, and these proportions are reworked into measures of settlement, disagreement or indeterminacy relating to the set of curiosity, versus posterior chances in customary Bayes.
- Dempster’s estimates rely upon the selection of sampling mechanism and related auxiliary variables, which is matter of many discussions in that literature.
- In a earlier put up I described an equivalence between the sampling mechanism thought of in [1] when there are greater than two classes, and the Gumbel-max trick… plainly the Dempster’s strategy has numerous intriguing connections.
References:
- [1] Arthur P. Dempster, New Strategies for Reasoning In direction of Posterior Distributions Primarily based on Smple Knowledge, 1966. [link]
- [2] Jordan Stoyanov & Christo Pirinsky, Random motions, lessons of ergodic Markov chains and beta distributions, 2000. [link]
- [3] Gérard Letac, Donkey stroll and Dirichlet distributions, 2002. [link]
- [4] Pierre E Jacob, Ruobin Gong, Paul T. Edlefsen & Arthur P. Dempster, A Gibbs sampler for a category of random convex polytopes, 2021. [link]
