(Errors in) Communicating Statistics: Base rate neglect


When the results were in on last June 26th, most of those watching were surprised or even shocked that a majority of Britons had just voted to leave the EU. Even among those who cast their vote for “Leave”, many said they had not anticipated the vote to come out in their favour, and some even stated they had “not wanted it to happen”.

Thus, when this tweet first entered social media, it was quickly tied to quotes as the one above, and a popular narrative emerged: “Britons had no idea what they were voting on!”

Google Trends tweeting about increase in searches for term “What happens if we leave the EU?”

The narrative, as wrong as it was, went viral.

An important characteristic of Google Trends is the fact that it only shows relative changes in the number of times a certain search phrase is used compared to all searches made.
In other words, if a search phrase was used by 30,000 users in one week, and by 75,000 the next, all other things equal Google Trends will show an increase of 250% between those weeks for the specific search phrase.

However, the same 250% of relative increase will show if the numbers are 300 for the first week and 750 for week two:

Both keywords show an increase from 100% to 250% over the respective time period.
Notice how the green line is barely visible although it increases by the same percentage as the red line. This is due to its lower base rate.

The number from which an increase is calculated, 300 for green and 30,000 for red in our example, is called the base rate. Neglecting it can drastically alter the story. In the case of Brexit, the +250% increase in usage of the search term “What is the EU/what happens if we leave the EU?” sounds impressive and meaningful only until we learn that the base rate was below a mere 300/day.

The fact that it almost tripled still means nothing, as the absolute number of Britons who googled it, below 1000/day, remains tiny against its population of roughly 60 million people. There simply is no story here, illustrated in our example by the barely visible green line in the plot for the absolute increase:

A huge relative increase of a tiny number still gives you a tiny number.

In times of clickbait-headlines and sensationalistic journalism, it is tempting to merely report such stories instead of actually researching them, lest the great story falls apart.

And some might say that in terms of ethics, base rate neglect isn’t so bad, since no one really gets hurt. But this is not always the case.

Every other couple of years, some UK outlets report the well-known fact that among women who take combined birth control in the 2nd or 3rd generation, the risk of thrombosis is several times as high as for the general populace.

But of course, base rate neglect is also at play here:
Depending on the specific study cited in those articles, the reported increase ranges between 200%-400%, with a base rate of only 1-2 cases per 7000 women in the most famous instance of the “pill scare”. The risk, although real, is in fact so small (both before and after the increase) that it is smaller than the risk of a thrombosis during an actual pregnancy.

As a result, a lot of women (temporarily) stop taking their contraceptives for worry about their personal health, resulting in what is regularly estimated to be thousands of unwanted pregnancies and subsequent abortions.

An abortion can mean an invasive medical procedure which, unlike a poorly researched clickbait article, carries inherent risks for complication. In this case, base rate neglect led to unnecessary physical harm of women who were scared by a medical risk so tiny that it was but a chimera.

Luis Dreisbach

Associate (Data Science)

+49 (0) 162 23 74  359


Potsdamer Straße 68
10785 Berlin