Cambridge Analytica: Beyond the Hype

The story immediately went viral: Big Data company Cambridge Analytica and its sophisticated psychographic models helped Donald Trump to secure the victory in the 2016 presidential election. The story played to all prevalent fears in the age of big data: privacy, microtargeting, behavioural steering. But now – with far less media buzz – the company admits that it was never really involved in the Trump campaign. What can we learn from this ‘scam’?

It’s no secret that data plays a large role in political campaigns. Especially in the US, given the existence of vast voter databases. Campaign operatives – Democrats and Republicans alike – consistently seek to explore new opportunities and methods to segment the voter database, determine voting preference and candidate affiliation in order to allocate their scarce resources accordingly. But every approach has its pitfalls and weaknesses.

Enter Cambridge Analytica.

The company claimed – and made this a central component of its PR activities – to have found the “magic sauce”, the overarching methodology, the holy grail, which then enabled the Trump campaign to compose fine-grained psychological profiles of every eligible voter – and ultimately win the election.

Some outlets even managed to interview the “mastermind” behind the methodology, a researcher from the University of Cambridge, who – saddened about the presumable impact of his research – is quoted to have only “shown that the bomb exists.” Now, with Trump’s election already a distant memory, news reports surface, which question the scope of impact of Cambridge Analytica within the Trump operations. “Cambridge’s data and models were slightly less effective than the existing Republican National Committee system”, writes the New York Times. All indicators point towards a huge publicity scam.

However, the simple fact that Cambridge Analytica managed to get so much media attention with its widely exaggerated claim, points to two common misconceptions in the age of data abundance.

Big Data, Big Solutions?

Everyone who has ever run an ad on Facebook is familiar with the variety of microtargeting possibilities. You want to reach only men, aged 18-25, who have studied at the University of Technology Munich and like ‘data science’? No problem.

We leave so many traces of our personality and identity online – be it on Facebook or other social media platforms. Aggregating the data, connecting the dots across many sources leaves us with the staggering assumption that this would – given the appropriate analytical tools – provide us with more, potentially election-changing insights.

However, it is more about the quality of data and selection of data sources than about the holistic integration of every data source available. Indeed, “more than half the Oklahoma voters whom Cambridge had identified as […] supporters actually favored other candidates”, writes the New York Times about a field test of the Cambridge Analytica solution. Big Data does not automatically produce big solutions.

Evaluating the Return on Data

Closely tied to the perceived impact of ‘big data solutions’ for political campaigns is a skewed relation to the ‘return on data’. While it is fairly easy to measure the impact of new targeting models in a fully-digital environment (like on Facebook by monitoring click-through rates), it becomes quite difficult in a setting, where one needs to convince people to conduct an activity offline (like voting), where direct causality can hardly be established.

Any data science project thus usually fairly quickly (and rightly so) arrives at a point, where it is essential to reliably quantify the monetary impact of the development of the respective algorithm. What is the return on data to be expected? What investment does that justify?

Cambridge Analytica’s claims did hold up for a fair amount of time, because they could not be challenged. In such a setting, not departing from more established methods of targeting might be the more responsible choice from a budgetary perspective. But then again, in a competitive pre-election landscape the promise of new analytics is tempting. And given the current system of campaign financing, money is oftentimes not a constraint.

Whether the new microtargeting methodologies are really a leap forward and expose the appropriate return-on-data profile will thus only be credibly answered, if broadly adopted in a free market setting. Until then, a healthy scepticism is always beneficial, when exploring new big data solutions, or just reading shady reports about their impact on political campaigns.