Right here’s a barely uncommon train on the subject of Bayes’ Theorem for these of you educating or learning introductory likelihood. Think about that you simply’re creating a diagnostic check for a illness. The check could be very easy: it both comes again optimistic or damaging. You could have a selection between barely rising both your check’s sensitivity or its specificity. In case your purpose is to maximise the optimistic predictive worth (PPV) of your check, i.e. the likelihood {that a} affected person has the illness on condition that the check comes again optimistic, which check attribute do you have to select to enhance?
An Open Invitation
If you happen to’re nonetheless hungry for extra Bayes’ Theorem after studying this publish, then why not be a part of the Summer time of Bayes 2024 on-line studying group? If you happen to’d prefer to be added to the mailing checklist, simply ship an e mail to bayes [at] person.despatched.as. Recordings of previous periods together with slides and different supplies can be found to group members through the Summer time of Bayes dialogue board. And now again your regularly-scheduled weblog content material…
Odds aren’t so odd!
Whereas I offer you a couple of minutes to pause and ponder this query, right here’s a short rant on the subject of odds. If you happen to’re something like me, the primary time you encountered odds, you thought to your self
What is that this $*@%^!? Why would anybody wish to spoil a wonderfully good likelihood by dividing it by one minus itself?“
But it surely’s time to take the purple capsule and see the world because it actually is: the one motive you favor to assume by way of possibilities somewhat than odds is since you’ve been brainwashed by the tutorial system. In fact I exaggerate barely, however the level is that odds are simply as pure as possibilities; we’re simply not as accustomed to working with them. In lots of conditions in likelihood, statistics, and econometrics, it seems that working with odds (or their logarithm) makes life a lot less complicated, as I’ll attempt to persuade you with a easy instance.
First we have to outline odds. Think about some occasion (A) with likelihood (p) of occurring. Then we are saying that the odds of (A) are (p/(1 – p)). For instance, if (p = 1/3) then the occasion (A) is equal to drawing a purple ball from an urn that incorporates one purple and two blue balls: the likelihood provides the ratio of purple balls to whole balls. The chances of (A), then again, equal (1/2): odds give the ratio of purple balls to blue balls. Since possibilities are between 0 and 1, odds are between 0 and (infty). Odds of 0 imply that the occasion is unimaginable, whereas odds of (infty) imply that the occasion is definite. Odds of 1 imply that the occasion is simply as prone to happen as to not happen.
Now right here’s an instance that you simply’ve certainly seen earlier than:
One in 100 ladies has breast most cancers ((B)). When you have breast most cancers, there’s a 95% likelihood that you’ll check optimistic ((+)); should you do not need breast most cancers ((B^C)), there’s a 2% likelihood that you’ll nonetheless check optimistic ((+)). We all know nothing about Alice aside from the truth that she examined optimistic. How possible is it that she has breast most cancers?
It’s simple sufficient to unravel this drawback utilizing Bayes’ Theorem, so long as you will have pen and paper helpful:
[
begin{aligned}
P(B | +) &= fracB)P(B){P(+)} = fracB)P(B)B)P(B) + P(+
&= frac{0.95 times 0.01}{0.95 times 0.01 + 0.02 times 0.99} approx 0.32.
end{aligned}
]
However what if I requested you the way the end result would change if just one in a thousand ladies had breast most cancers? What if I modified the sensitivity of the check from 95% to 99% or the specificity from 98% to 95%? If you happen to’re something like me, you’d battle to do these calculations in your head. +hat’s as a result of (P(B|+)) is a extremely non-linear perform of (P(B)), (P(+|B)), and (P(+|B^C)).
In distinction, working with odds makes this drawback a snap. The important thing level is that (P(B|+)) and (P(B^C|+)) have the identical denominator, particularly (P(+)):
[
P(B | +) = fracB)P(B){P(+)}, quad
P(B^C | +) = fracB^C)P(B^C){P(+)}
]
Discover that (P(+)) was the “difficult” time period in (P(B|+)); the numerator was easy. For the reason that odds of (B) given ((+)) is outlined because the ratio of (P(B|+)) to (P(B^C|+)), the denominator cancels and we’re left with
[
text{Odds}(B|+) equiv frac+)+) = fracB)B^C) times frac{P(B)}{P(B^C)}.
]
In different phrases, the posterior odds of (B) equal the chance ratio, (P(+|B)/P(+|B^C)), multiplied by the prior odds of (B), (P(B)/P(B^C)):
[
text{Posterior Odds} = text{(Likelihood Ratio)} times text{(Prior Odds)}.
]
Now we are able to simply resolve the unique drawback in our head. The prior odds are 1/99 whereas the chance ratio is 95/2. Rounding these to 0.01 and 50 respectively, we discover that the posterior odds are round 1/2. Because of this Alice’s likelihood of getting breast most cancers is roughly equal to the possibility of drawing a purple ball from an urn with one purple and two blue balls. There’s no have to convert this again to a likelihood since we are able to already reply the query: it’s significantly extra possible that Alice doesn’t have breast most cancers. However should you insist, odds of 1/2 give a likelihood of 1/3, so regardless of rounding and calculating in our heads we’re inside 0.3% of the precise reply!
Repeat after me: odds are on a multiplicative scale. That is their key advantage and the explanation why they make it really easy to discover variations on the unique drawback. If one in a thousand ladies has breast most cancers, the prior odds develop into 1/999 so we merely divide our earlier end result by 10, giving posterior odds of round 1/20. If we as a substitute modified the sensitivity from 95% to 99% and the specificity from 98% to 95%, then the chance ratio would change from (95/2 approx 50) to (99/5 approx 20).
The Answer
Have I given you sufficient time to give you your personal resolution? Improbable! In case you hadn’t already guessed, that little digression about odds served an necessary function: my resolution will use odds somewhat than possibilities. Our purpose is to extend the optimistic predictive worth (PPV) of the check, particularly
[
text{PPV} equiv P(text{Has Disease}|text{Test Positive}),
]
by as a lot as doable, both by enhancing the check’s sensitivity
[
text{Sensitivity} equiv P(text{Test Positive} | text{Has Disease})
]
or its specificity
[
text{Specificity} equiv P(text{Test Negative} | text{Doesn’t Have Disease}).
]
To reply this query, we’ll begin by substituting these definitions into the percentages type of Bayes’ Theorem launched above, yielding
[
text{Posterior Odds} = frac{text{PPV}}{1 – text{PPV}} = frac{text{Sensitivity}}{1 – text{Specificity}} times text{Prior Odds}.
]
This expression makes it clear that rising both the sensitivity or specificity of the check will increase the posterior odds. And since the PPV is a strictly rising perform of the posterior odds, particularly
[
text{PPV} = frac{text{Posterior Odds}}{1 + text{Posterior Odds}},
]
this additionally will increase the PPV. So now the query is: which of those two prospects provides us essentially the most bang for our buck? A pure concept could be to check the marginal impact of accelerating sensitivity by a small quantity to the marginal impact of accelerating specificity by the identical quantity. We will do that by evaluating the partial derivatives of the PPV with respect to sensitivity and specificity. However, once more, the PPV is an rising perform of the posterior odds, so we are able to simplify our activity by evaluating the derivatives of the posterior odds with respect to sensitivity and specificity. By the chain rule, any declare concerning the relative magnitudes of those derivatives computed for the percentages will even maintain for the PPV.
However why cease with the percentages? We will simplify our activity even additional by evaluating the derivatives of the logarithm of the posterior odds with respect to sensitivity and specificity. It’s because the logarithm is, once more, an rising transformation of the percentages.
Since
[
log(text{Posterior Odds}) = log(text{Sensitivity}) – log(1 – text{Specificity}) + log(text{Prior Odds}).
]
our required derivatives are
[
frac{partial log(text{Posterior Odds})}{partial text{Sensitivity}} = frac{1}{text{Sensitivity}} quad text{and} quad frac{partial log(text{Posterior Odds})}{partial text{Specificity}} = frac{1}{1 – text{Specificity}}.
]
Now for the punchline: the ratio of the by-product with respect to specificity divided by that with respect to sensitivity is
[
frac{partial log(text{Posterior Odds})/partial text{Specificity}}{partial log(text{Posterior Odds})/partial text{Sensitivity}} = frac{1/(1 – text{Specificity})}{1/text{Sensitivity}} = frac{text{Sensitivity}}{1 – text{Specificity}}
]
and that is exactly the chance ratio from the percentages type of Bayes Theorem! Therefore, at any time when the chance ratio is larger than one we’d want to extend the check’s specificity; at any time when it’s lower than one we’d want to extend the sensitivity. If the chance ratio is the same as one, then it doesn’t matter which we select.
Case closed, proper? Effectively not fairly. We will say a bit extra by excited about what it means for the chance ratio to be higher than or lower than one. Inspecting the percentages type of Bayes’ Theorem from above, we see {that a} chance ratio lower than one signifies that our posterior likelihood that an individual is sick falls when she checks optimistic. In different phrases, this corresponds to a check that’s worse than ineffective: it’s really deceptive. In distinction, a chance ratio higher than one signifies that the check is informative: a optimistic check end result will increase our perception that the individual is sick. Any real-world diagnostic check may have a chance ratio higher than one. Certainly, if we had such an actively mis-leading check, we might simply convert it into an informative one by merely reversing the check’s final result: if somebody checks optimistic, we inform them they’re damaging, and vice versa. This reversal would lead to a chance ratio higher than one. Subsequently, in all instances–whether or not we begin with an informative check or reverse a deceptive one–we should always want to extend the check’s specificity.
Epilogue
In fact, this train relies upon the belief that we wish to maximize the PPV and that we are able to freely modify each the check’s sensitivity and its specificity. In follow, a number of of those assumptions may not maintain. Certainly, PPV isn’t the be all and finish all of diagnostic testing. A full accounting would want to think about the relative prices of false positives and false negatives together with the prevalence of the illness. Nonetheless, I hope this train provides you a taste of the ability of odds for simplifying complicated issues in likelihood and statistics.
