Tuesday, November 4, 2025

Classes from the Oxford Vaccination Survey


Again in November a colleague pointed me to a web site describing the latest COVID-19 Pupil Vaccination Survey carried out by my employer, the College of Oxford. On the time I briefly tweeted my issues on the College:

however by no means acquired a response. On this publish, armed with excess of 280 characters, I’ll clarify what went improper within the Oxford Vaccination Survey and recommend some methods of doing higher subsequent time.

Whereas the college web site doesn’t present detailed info on the survey’s methodology, it does enable us to determine just a few key info. First: this was not in actual fact a survey; it was a census.

All college students have been invited to finish the COVID-19 Pupil Vaccination Survey in third and 4th Weeks of Michaelmas time period.
This very brief kind requested college students to substantiate their COVID-19 vaccination standing.

Surveys use a pattern to study a inhabitants.
Pollsters don’t ask all American voters in the event that they approve of Biden; they ask a small subset of voters, utilizing a carefully-designed sampling scheme.
A census, however, makes an attempt to succeed in every individual within the goal inhabitants.
That is how the Oxford Vaccination “Survey” was performed.

Second: roughly half of Oxford college students selected to not reply:

The response price was 49.3%, and there have been just about no variations in vaccination charges between totally different faculties and departments.

Third: amongst those that did full the questionnaire, a really excessive proportion indicated that they have been vaccinated:

A complete of 98% of respondents reported that they have been vaccinated (95% totally and three% partially).

Their headline conclusion was that “the survey indicated that the overwhelming majority of scholars are actually vaccinated.” Additional down, below the FAQs, they added the next clarification:

Provided that the response price was 50%, how are you going to make certain that the vaccination price displays the entire scholar inhabitants?

50% is taken into account a excessive response price in surveys of this kind. It permits for a dependable and highly effective statistical check to be performed on the response knowledge. Primarily based on our analysis, we might be 95% sure that the true vaccination degree amongst all Oxford college students (primarily based on those that responded) is between 97.8% and 98.3%. We will also be 99% sure that the true vaccination degree amongst all Oxford college students is between 97.7% and 98.4%. In gentle of this, we might be extraordinarily assured that the overwhelming majority of Oxford college students are both totally or partially vaccinated.

That is improper. First, the response charges for different “surveys of this kind” are irrelevant: what issues is what we are able to be taught from this dataset given the noticed response price and the the reason why college students selected to not reply. It appears fairly implausible that the vaccination price amongst respondents displays the entire scholar inhabitants. If, as I believe, conscientious college students are each extra prone to be vaccinated and extra seemingly to answer a questionnaire, the 95% determine is nearly actually too excessive. Second, statistical checks and confidence intervals are designed to quantify sampling error: non-systematic variations in vaccination charges within the pattern in comparison with the inhabitants that come up when a pattern is drawn at random. However the issue right here is non-sampling error: college students who reply are seemingly totally different of their price of vaccination from those that don’t. This sort of systematic distinction between pattern and inhabitants will not be accounted for in a confidence interval or statistical check. In gentle of this, we can not conclude that “the overwhelming majority of Oxford college students are both totally or partially vaccinated.”

So what can we conclude? It relies on what we’re keen to imagine. I’ll begin off by contemplating the worst case: suppose we’re not keen to imagine something in regards to the the reason why some college students selected to not reply.

I’ve a bowl containing 100 balls; (W) of them are white and the remaining are are black. I take away 50 balls from the bowl and lay them on the desk so you may see them . Of those, 49 are white and 1 is black. What are you able to conclude about (W)? The reply relies upon crucially on how I selected which balls to indicate you.

If I drew the balls at random, then you need to use commonplace statistical instruments–speculation testing, confidence intervals, a Bayesian posterior distribution–to make inferences about (W). Random sampling ensures that the balls you see earlier than you on the desk are consultant of the balls that stay within the bowl. This creates an inferential hyperlink between what you may observe and what you may’t, and means that you can quantify uncertainty about (W) utilizing the language of likelihood.

However what if you realize nothing about how I selected which balls to indicate you? Maybe I made a decision to indicate you all the white balls within the urn; or maybe I made a decision to indicate you all the black balls: you don’t have any thought. On this case the inferential hyperlink between what you may see on the desk and what stays within the bowl is damaged and the acquainted instruments of statistical inference can’t be instantly utilized. Until you’re keen to take a stand of some variety on how I selected which balls to take away from the urn, all which you could conclude about (W) is that it have to be not less than 49 and not more than 99. Primarily based on an noticed fraction of 49/50 white balls on the desk, you may conclude solely that between 49/100 and 99/100 of the balls within the bowl are white.

Now change 100 balls within the bowl with 25820 college students within the college, 50 balls on the desk with 12729 respondents to the questionnaire, and 49 white balls on the desk with 12475 vaccinated college students amongst those that responded.
Suppose for simplicity that everybody who responds to the questionnaire does do in truth. (We’ll revisit this beneath.)
Until we all know one thing about how and why respondents determined to reply, all we are able to infer from this info is that no fewer than 12475 and not more than 25566 Oxford college students have been vaccinated. Expressed as a proportion, this works out to between 48% and 99% of scholars at Oxford with not less than one COVID-19 vaccination.

Whereas this end result, [0.48, 0.99], appears to be like superficially much like a confidence interval, it’s a very totally different animal. A confidence interval comprises the values of an unknown amount which are “in line with the information” in a exact sense. As defined above, confidence intervals quantifies uncertainty arising from sampling error. However our uncertainty within the vaccine instance does not come from random sampling: it comes from lacking knowledge. In impact, we’ve got a whole census of the type of Oxford scholar who fills out COVID-19 vaccination questionnaires. What we lack is any knowledge in anyway on the type of Oxford scholar who doesn’t reply. Every level within the interval [0.48, 0.99] corresponds to a distinct assumption in regards to the relative vaccination charges of respondents in comparison with non-respondents (the fraction of white balls on the desk in comparison with the fraction remaining within the bowl).

Apart from reporting the somewhat pessimistic sure [0.48, 0.99], is there something extra that we are able to say in regards to the share of vaccinated Oxford college students? Sure, however provided that we’re keen to make some assumptions.

Let’s begin with the strongest attainable assumption: knowledge which are lacking utterly at random (MCAR). Within the vaccination instance, MCAR quantities to assuming that college students who reply are consultant of scholars who don’t reply. Within the language of my balls-and-bowl instance, MCAR would maintain if the 50 balls on the desk have been drawn from the bowl at random. Beneath MCAR, we are able to successfully ignore the issue of non-response and report an estimate of 95% vaccine protection amongst Oxford college students. However whereas it’s logically attainable for the MCAR assumption to carry on this instance, it’s not very believable. As I discussed above, it appears seemingly that extra conscientious college students will each be extra prone to get vaccinated and to answer to a questionnaire. In that case, MCAR fails.

Extra believable than MCAR is the idea of information which are lacking at random (MAR). This concept is finest illustrated with an instance. Suppose that there are 4 sorts of balls within the bowl: giant white balls, giant black balls, small white balls, and small black balls. We’re not within the dimension of the balls; we solely wish to know what number of are white and what number of are black. As earlier than, I draw 50 balls from the bowl at random and lay them on the desk. Sadly, small balls are likely to sink to the underside of the bowl so I’m disproportionately seemingly to attract a big ball. On this case, the balls you may see on the desk are not consultant of the balls within the bowl: there are most likely too many giant balls on the desk.

To make progress we’d like an assumption. Suppose that, after accounting for dimension, the prospect {that a} I draw a specific ball doesn’t depend upon its colour. Beneath this assumption the giant balls on the desk are consultant of the massive balls that stay within the bowl. Equally, the small balls on the desk are consultant of the small balls that stay within the bowl. That is exactly the MAR assumption: conditional on one thing we all know (the dimensions of a ball), the information we are able to observe (balls on the desk) are consultant of the lacking knowledge (balls within the bowl). MCAR doesn’t maintain as a result of the 50 balls on the desk are not consultant of the 50 balls that stay within the urn (giant balls are disproportionately prone to be drawn). They solely turn into consultant after we account for dimension.

So how may the MAR assumption enable us to be taught extra within the vaccination instance? From the figures on web page 85 of this HSA report we see that, inside every UK age group, females usually tend to be vaccinated than males. Suppose that this holds amongst Oxford college students as properly. If there may be any distinction between survey response charges for female and male college students, MCAR fails and the 95% estimated vaccination price will likely be unreliable. However suppose we’re keen to imagine that feminine respondents are consultant of feminine non-respondents. On this case, the share of vaccinated feminine respondents is an efficient estimate of the share of vaccinated feminine college students, and equally for male respondents. As a result of we all know that 49% of Oxford college students are feminine, we are able to use this info to calculate estimate of the general share of vaccinated college students. If MAR conditional on intercourse looks as if too robust an assumption, maybe you’d be keen to imagine that feminine undergraduates from the UK who reply are consultant of all feminine undergraduates from the UK. In spite of everything: abroad college students are nearly actually vaccinated, given authorities journey restrictions. On this case we’d kind eight teams (male/feminine (instances) abroad/residence (instances) undergrad/post-grad), estimate the share of vaccination individually for every group utilizing the information for respondents. To acquire an total estimate for the share of vaccinated college students, we’d merely common the estimates for every “sort” of scholars, weighting by their respective shares within the Oxford scholar physique. This method of re-weightingestimates for specific teams is named post-stratification and is extensively utilized in political polling. The belief that underlies it’s MCAR: conditional on traits we are able to observe, whether or not an individual responds to the survey or not is “nearly as good as random,” i.e. unrelated to the query we ask in our ballot.

So does MAR maintain in our vaccination instance? Maybe not. Getting vaccinated is extensively considered as a pro-social act. The phenomenon of social desirability bias means that people who find themselves not vaccinated could also be much less prone to reply even after conditioning on their noticed traits. In that case MAR won’t maintain. Even so, I’d put rather more religion in outcomes that used post-stratification to regulate for noticed traits like intercourse and residential/abroad which are plausibly associated to each response charges and vaccination standing. Relying on the exact nature of the information that Oxford collected, it’s attainable that this evaluation may nonetheless be carried out.

There’s an outdated saying (typically attributed to Fisher) that calling in a statistician after you’ve collected your knowledge is like calling in a health care provider after a cherished one has died: all she will be able to do is carry out a autopsy; it’s too late to save lots of the affected person. So how may Oxford do a greater job the subsequent time that they wish to perform a survey?

One of many key classes of statistics is {that a} comparatively small pattern can nonetheless present dependable inferences in regards to the inhabitants supplied that the pattern is drawn at random. Somewhat than contacting all college students, select a random pattern and concentrate on maximizing the response price. There are a number of methods to do that. First is multi-mode surveys: ship an e-mail along with a letter along with a textual content message. Second is incentives: maybe supply a present card upon receipt of the finished survey. Regardless of your finest efforts, inevitably some individuals nonetheless received’t reply. That is the place two-stage sampling can assist: take a second random pattern of the non-respondents and call these individuals a second time. So long as you attain a consultant pattern of non-respondents, it doesn’t matter in the event you attain all of them. If some non-response stays, attempt to alter for it utilizing post-stratification.

The entire dialogue so far has assumed that respondents reply in truth. However social desirability bias may additionally lead individuals to mis-represent their true vaccination standing. This can be a a lot more durable drawback to resolve, however making certain privateness can assist. Posting a detailed privateness coverage is a useful first step, however college students should be suspicious when a authorities regulator asks their college to assemble probably delicate knowledge about them. Randomized response strategies handle this concern by designing surveys during which it’s unimaginable for researchers or anybody with whom they share knowledge to deduce particular person responses. How does this work? Within the easiest attainable instance, I offer you a coin and inform you to flip it in secret. I instruct you to reply the survey in truth in the event you flip heads, and to easily examine “NO I AM NOT VACCINATED” in the event you flip tails. After I learn the survey, I’ve no means of figuring out in the event you haven’t been vaccinated or merely flipped tails. Regardless of this, it’s nonetheless attainable to assemble a dependable estimator of the share of people who find themselves vaccinated.

Numbers are solely beneficial once they inform us one thing significant in regards to the world. If we care sufficient to ask the query, we must always care sufficient to do job answering it. Whereas statistical inference and modeling might be extraordinarily beneficial instruments for studying in regards to the world, what issues most is accumulating good, clear knowledge. Could you be blessed with 100% response charges in 2022. Should you’re not, make it your new 12 months’s decision to do one thing about it!



Related Articles

Latest Articles