Photo: Baltimore County Government

While countries around the world are speeding up their vaccination rollout plans, some people are still hesitant to get a Covid-19 vaccine. How effective are Covid-19 vaccines? Are efficacy claims absolutely certain?


We sat down with Bart de Langhe, associate professor of marketing at Esade, to talk about the challenges of big data and the dangers of focusing on vaccine efficacy results. “It’s dangerously easy to misinterpret data, especially when it’s reported in percentages rather than absolute numbers,” warns de Langhe in Harvard Business Review.

Do Better: How do we know that Covid-19 vaccines are really effective?

Bart de Langhe: When Pfizer first announced that their vaccine was more than 90% effective, I presented their data to my MSc in Marketing students, who had received some background about randomised control trials. I noticed they were paying a lot of attention to the fact that there were more than 43,000 participants in the study and that they had diverse backgrounds.

The larger the sample the better, right?

In general, yes. But what my students failed to realise is that these randomised control trials were designed to examine a low probability event. This means that even if you have large amounts of data, when you are dealing with a low probability event – for instance, the likelihood of someone getting Covid-19 – you are making your data much smaller. In the end, the data you should care about is not the more than 43,000 people participating in the study, but the number of people in the study who contracted Covid-19. And at the time of Pfizer’s press release, there were only 95 participants who tested positive.

Bart de Langhe at Esade
Bart de Langhe at Esade Business School (Photo: Esade)

That doesn’t seem like enough data to use this vaccine throughout the world

The first mistake is thinking that more than 43,000 participants is a lot without realising that the crucial data is in the small subset of participants who test positive. The second mistake is to think that the small subset is too small — it turns out it’s not. If you have a randomised control trial, like Pfizer’s for instance, and you observe that the large majority of people with Covid-19 are in the placebo group, you can conclude that the vaccine is effective, and it works. Large differences in confirmed cases between groups in a randomised trial are extremely unlikely to happen due to chance, or any reason other than the vaccine.

What does it not allow you to conclude?

It does not allow you to conclude that it works better for old people than for young people. Or that it works better for white people than for black people. Or that it works better for people who have diabetes versus people who don’t. It allows us to say: overall, this vaccine seems to be highly effective, but you can't make more fine-grained comparisons than that. Pfizer didn’t do that, that’s perfectly fine. Their announcement was reasonable, although their claim that their vaccine was “more than 90% effective” was a bit deceptive.

The first mistake is thinking that more than 43,000 participants is a lot without realising that the crucial data is in the small subset of participants who test positive

Why is that deceptive?

Even though the study involved more than 43,000 participants, only 8 people in the vaccinated group developed Covid-19, compared to 86 in the placebo group. This gives you an efficacy rate – a point estimate – of 90.7%. But there is some uncertainty around that point estimate. We can’t really be sure that it’s 90.7%. Maybe it’s lower than that. Or higher.

How so?

At the time of Pfizer’s press release, there was not enough data to say with confidence that vaccine efficacy was higher than 90%. I don’t know exactly what their confidence interval was. I don’t think they announced it, but I’m guessing maybe the low end would have been 75% or so. They didn’t have the data at that time to make such a precise claim without an interval. The other pharmaceutical companies that followed in the weeks afterwards pushed this precision even further. And you may wonder why the specificity.

Pfizer vaccine
At the time of Pfizer’s press release, there was not enough data to say with confidence that vaccine efficacy was higher than 90% (Photo: Marco Verch/Flick)

Right, why?

I think this precision is to be persuasive. When someone gives you a precise number you trust it more. But the problem is that precision undermines accuracy. We can’t observe it, but somewhere there is a true vaccine efficacy percentage that pharmaceutical companies are trying to estimate with their trials. You can approximate it – but the more precise you are with your estimates, the less likely you are to be accurate. I'm accurate if I say vaccine efficacy is between 0% and 100%.

Extremely accurate

But I’m neither precise nor informative. But if you go to the other extreme, like Moderna’s claims of 94.5% efficacy, you are very precise, and this gives an illusion of information because you’re losing accuracy. Yet it’s highly inaccurate because vaccine efficacy may be 75% based on the data collected at the time.

It’s easy to misinterpret numbers...

I think this is dangerous, especially in the context of something like vaccines where a sizeable group of people are sceptical about the efficacy and risks. By making these comparisons people might conclude that they don’t want vaccines with lower efficacy rates. Not that they will be able to decide which vaccine to inject, but it could decrease their willingness to be vaccinated. So there’s definitely a danger.

The more precise you are with your estimates, the less likely you are to be accurate

The problem with precision is something that you see everywhere in business and marketing analytics. We tend to communicate in precise ways, and not in terms of uncertainty. We should do much more to quantify the uncertainty that we have around our estimates. Ultimately, if you want to make good plans and prepare for the future in a world that is fairly complicated, it’s about envisioning alternative scenarios. If you are too precise and ignore uncertainty, you are less likely to be able to prepare for the future.

Back to vaccines, what would be your practical advice for people to avoid misinterpreting data?

Whenever you see an estimate that is precise, you must remember that what people are trying to do is to estimate an uncertain quantity. Ask yourself: if I hadn’t seen this estimate and I wanted to estimate this uncertain quantity, what are all the pieces of information that I would need to make an estimate? This question highlights the complexity of it all – there is no way you can estimate uncertainty with such great precision!

It seems impossible...

Another example is the attempt to estimate global mortality rates due to Covid-19 – this is an incredibly complex endeavour. Scientists are not yet able to do that. You need to realise that you can’t take big data at face value. Big data is often small data in disguise.

We should do much more to quantify the uncertainty that we have around our estimates

The big numbers involved in social media are another example. The big players in social media may have millions of followers, but the truth is that only a tiny fraction of those millions of people will see your posts. And of that small fraction, an even smaller fraction will engage. So technically you start with over two billion active users on Facebook, but in the end, you end up posting for five people. I’m not saying that companies shouldn’t care about social media, they should, but it’s easy to be misguided by big data.

The conclusion is: don’t be persuaded by big numbers and do not underestimate the value of small numbers. Big data is often precisely wrong. And small data is often vaguely right. And it’s better to be vaguely right, than precisely wrong in many circumstances.

HBR article Bart de Langhe
Related content: Covid-19 vaccine trials are a case study on the challenges of data literacy (Harvard Business Review)

In your HBR article, you mention a lesson linked to AstraZeneca’s announcement. What happened there?

When AstraZeneca announced that its vaccine was “only 70% effective”, they also claimed a 90% efficacy rate based on a small subset of 2,741 participants who were administered a half-dose regimen. When you start to look at subsets of subsets of large groups, the data starts to get too small. And the smaller the subsets of people you’re comparing – the more noise and the more uncertainty – and so the more cautious you should be when making comparisons.

If AstraZeneca had said in advance: we’re going to do a study and vary the dosing regimen to compare the efficacy for low versus the high doses, then you would have more confidence in their conclusion that the low dose works better.

When you start to look at subsets of subsets of large groups, the data starts to get too small

Why should we not trust it now?

Because they didn’t plan it in advance. You need to distinguish prediction from post-diction. AstraZeneca came up with the hypothesis that the dosage regimen might matter while analysing the data. When you slice your data in many ways, you’ll find differences. Everybody can make post-dictions. It’s easy to find differences and come up with potential explanations. But when you’re post-dicting, your explanation is more likely to be just a story. There’s an explanatory component to stories, but they are not real explanations of why things might have happened the way they did. You have to be careful with post-diction. If you really understand the world, you should be able to make predictions.

So, the recommendation here is: whenever someone makes a prediction about the future, write it down, keep track of it, and when the future has happened, check if it was accurate. Because there are too many “gurus” out there predicting what will happen and they are not held accountable.

Because nobody follows up...

We should do that as organisations. We need to develop some kind of prediction accounting system to see whether our predictions pan out. The overall lesson is be sceptical. Don’t take big data at face value, don’t take precise data at face value, and be very careful when people tell you what happened: it’s not the same as people telling you what will happen and showing that it actually happened.

Are you going to get a vaccine shot anytime soon?

At this point there’s enough evidence that the current vaccines work. Even though the AstraZeneca vaccine performed less well, 70% efficacy is similar to regular influenza vaccines and higher than the WHO target rate. So, it’s a good vaccine. I don’t know when the vaccine will become available, but I will take one.

All written content is licensed under a Creative Commons Attribution 4.0 International license.