The Poisson distribution is essentially the most well-known likelihood mannequin for counts, non-negative integer values. Many real-world phenomena are effectively approximated by this distribution, together with the variety of German bombs that landed in 1/4km grid squares in south London throughout WWII.
Formally, we are saying {that a} discrete random variable (X) follows a Poisson distribution with fee parameter (mu > 0), abbreviated (X sim textual content{Poisson}(mu)), if (X) has help set ({0, 1, 2, …}) and likelihood mass operate
[
p(x) equiv mathbb{P}(X=x) = frac{e^{-mu }mu^x}{x!}.
]
Utilizing some intelligent algebra with sums it’s not too exhausting to point out that the speed parameter, (mu), is each the imply and the variance of (X).
Numerical issues? Attempt taking logs.
Now, suppose that we needed to plot the pmf of a Poisson RV with fee (mu = 171).
The R operate for the pmf of a Poisson RV is dpois(), so we will make our plot as follows (indicating the speed parameter as a vertical line)
library(tidyverse)
tibble(x = 0:300) %>%
mutate(p = dpois(x, 171)) %>%
ggplot(aes(x, p)) +
geom_point() +
geom_vline(xintercept = 171) +
ylab('Poisson(171) pmf')
For such a big worth of (mu), this distribution appears to be like decidedly bell-shaped.
And certainly, it seems to be extraordinarily well-approximated by a traditional distribution, as we’ll see under.
It’s additionally clear that (X) is almost definitely to tackle a worth comparatively near 171.
We are able to use dpois() to calculate the precise likelihood that (X = 171) as follows: the reply is simply over 3%.
dpois(171, 171)
## [1] 0.03049301
Now let’s attempt to calculate precisely the identical likelihood by hand, that’s through the use of the components for the Poisson pmf from above.
my_dpois <- operate(x, mu) {
exp(-mu) * mu^x / factorial(x)
}
my_dpois(171, 171)
## [1] NaN
What offers?!
The abbreviation NaN stands for “not a quantity.”
The issue on this case is that each the numerator and denominator of the fraction within my_dpois() consider to infinity when mu and x are 171, and the ratio (infty/infty) is undefined.
c(numerator = exp(-171) * 171^171, denominator = factorial(171))
## numerator denominator
## Inf Inf
As I mentioned in an earlier submit, computer systems can solely retailer a finite variety of distinct numeric values.
It’s not actually true that factorial(171) equals (infty).
What’s actually occurring right here is that factorial(171) is such a big quantity that it may well’t be saved as a floating-point quantity.
On this case there’s a quite simple repair.
For those who haven’t seen this trick earlier than, it’s a useful one to maintain up your sleeves: in case you run into numerical issues with very giant or very small values, attempt taking logs.
The log of the Poisson pmf is solely
[
log p(x) = -mu + x log(mu) – log(x!).
]
R even has a handy, built-in operate for evaluating the natrual log of a factorial: lfactorial().
Now we will compute the log of our desired likelihood as follows:
-171 + 171 * log(171) - lfactorial(171)
## [1] -3.490258
To acquire the likelihood, merely exponentiate:
exp(-171 + 171 * log(171) - lfactorial(171))
## [1] 0.03049301
After all this simply passes the buck to lfactorial(). So how does this mysterious operate work? The dangerous information is that I’m not going to inform you; the excellent news is that I’m going to point out you one thing even higher, specifically Stirling’s approximation: a approach to perceive now (n!) behaves qualitatively that seems to present a fairly darned good approximation to lfactorial().
This will likely seem to be an odd matter for a weblog dedicated to econometrics and statistics, so permit me to supply just a few phrases of justification.
First, computations involving (n!) come up on a regular basis in utilized work.
Second, it may be extraordinarily useful for sure theoretical arguments to have good approximations to (n!) for big values of (n).
Lastly, and most significantly from my perspective, the heuristic argument I’ll use under depends on none apart from the central restrict theorem.
So even in case you’ve seen a extra conventional proof of Stirling’s approximation, I hope you’ll get pleasure from this various strategy.
Stirling’s Approximation
The important thing step in our argument is to point out that the pmf of a (textual content{Poisson}(mu)) random variable is well-approximated by the (textual content{Regular}(mu, mu)) density.
This explains the bell-shaped curve that we plotted above.
To acquire this end result, we’ll use the central restrict theorem.
However there may be one truth that you will want to tackle religion in case you don’t already realize it: if (X_1 sim textual content{Poisson}(mu_1)) is impartial of (X_2 sim textual content{Poisson}(mu_2)) then (X_1 + X_2 sim textual content{Poisson}(mu_1 + mu_2)).
Continuing by induction we will view a Poisson(171) random variable because the sum of 171 impartial Poisson(1) random variables.
Extra usually, we will view a Poisson RV with fee parameter (n) because the num of (n) iid Poisson(1) random variables.
By the central restrict theorem, it follows that
[
sqrt{n}(bar{X}_n – 1) rightarrow_d text{N}(0,1)
]
for the reason that imply and variance of a Poisson(1) RV are each equal to at least one.
From a sensible perspective, because of this (sqrt{n}(bar{X}_n – 1)) is roughly equal to (Z), a normal regular random variable.
Re-arranging,
[
X_1 + X_2 + … + X_n = nbar{X}_n = n + sqrt{n} times [sqrt{n}(bar{X}_n – 1)] approx n + sqrt{n} Z
]
and (n + sqrt{n} Z) is solely a (textual content{N}(n, n)) random variable!
This can be a fast manner of seeing why the (textual content{Poisson}(mu)) distribution is well-approximated by the (textual content{N}(mu, mu)) distribution when (mu) is giant.
Now let’s run with this.
As we simply noticed, for big (mu) the Poisson((mu)) pmf is well-approximated by the Regular((mu, mu)) density:
[
frac{e^{-mu}mu^x}{x!} approx frac{1}{sqrt{2pi mu}} expleft{ -frac{1}{2}left( frac{x – mu}{sqrt{mu}}right)^2right}
]
This approximation is especially correct for (x) close to the imply. That is handy, as a result of substituting (mu) for (x) significantly simplifies the proper hand aspect:
[
frac{e^{-mu}mu^mu}{mu!} approx frac{1}{sqrt{2pimu}}
]
Re-arranging, we acquire
[
mu! approx mu^mu e^{-mu} sqrt{2 pi mu}
]
Taking logs of each side offers:
[
log(mu!) approx mu log(mu) – mu + frac{1}{2} log(2 pi mu)
]
Scripting this with (n) instead of (mu) offers the next:
[
log(n!) approx n log(n) – n + frac{1}{2} log(2 pi n)
]
That is referred to as Stirling’s Approximation. The same old manner of scripting this excludes the (log(2pi n)/2) time period, yielding (log(n!) approx nlog(n) – n), which is pretty straightforward to recollect. Together with the additional time period, nonetheless, offers elevated accuracy for smaller values of (n).
Whereas I haven’t formally proved this, it seems that
[
log(n!) sim n log(n) – n + frac{1}{2} log(2 pi n)
]
as (n rightarrow infty). In different phrases, the ratio of the LHS and RHS tends to at least one within the giant (n) restrict.
Maybe surprisingly, this roughly is extraordinarily correct even for pretty small values of (n), as we will see by evaluating it towards lfactorial().
stirling1 <- operate(n) n * log(n) - n
stirling2 <- operate(n) n * log(n) - n + 0.5 * log(2 * pi * n)
tibble(n = 1:20) %>%
mutate(Stirling1 = stirling1(n),
Stirling2 = stirling2(n),
R = lfactorial(n)) %>%
knitr::kable(digits = 3)
| 1 | -1.000 | -0.081 | 0.000 |
| 2 | -0.614 | 0.652 | 0.693 |
| 3 | 0.296 | 1.764 | 1.792 |
| 4 | 1.545 | 3.157 | 3.178 |
| 5 | 3.047 | 4.771 | 4.787 |
| 6 | 4.751 | 6.565 | 6.579 |
| 7 | 6.621 | 8.513 | 8.525 |
| 8 | 8.636 | 10.594 | 10.605 |
| 9 | 10.775 | 12.793 | 12.802 |
| 10 | 13.026 | 15.096 | 15.104 |
| 11 | 15.377 | 17.495 | 17.502 |
| 12 | 17.819 | 19.980 | 19.987 |
| 13 | 20.344 | 22.546 | 22.552 |
| 14 | 22.947 | 25.185 | 25.191 |
| 15 | 25.621 | 27.894 | 27.899 |
| 16 | 28.361 | 30.667 | 30.672 |
| 17 | 31.165 | 33.500 | 33.505 |
| 18 | 34.027 | 36.391 | 36.395 |
| 19 | 36.944 | 39.335 | 39.340 |
| 20 | 39.915 | 42.331 | 42.336 |
Epilogue
I’ve a nasty behavior of making an attempt so as to add a “ethical” or “lesson” to the top of my posts, however I suppose there’s no level making an attempt to interrupt the behavior as we speak! Whereas there are simpler methods to derive Stirling’s approximation, there are two issues I get pleasure from about this one. First, we get a extra correct approximation than (n log(n) – n) with virtually no effort. Second, making sudden connections between information that we already know each deepens our understanding and helps us “compress” info. For those who ever overlook Stirling’s approximation, now you understand how to in a short time re-derive it on the spot!
