4 July 2018

It’s time to put Paul the Octopus on the barbecue

By

Bashing forecasts and forecasting seems to be in vogue lately. Not only have recent years seen a number of political upsets, but the current World Cup has seen similarly notable
surprises (and not just England winning a penalty shootout).

But is the criticism of forecasting fair? Should we junk them and bring back Paul the Octopus, whose football “predictions” brought him global fame as some kind of marine oracle?

Or is the real problem with the way forecasts are being interpreted? To answer that, we need to think about what a forecast or a prediction is. Essentially, it’s an assessment of the likelihood of an outcome (or outcomes) at some point in the future. And how those estimates are communicated and reported can have a big impact on how they are perceived.

Sometimes this can be harmless fun – one need only think back to Paul the Octopus, whose football “predictions” brought him global fame as some kind of marine oracle.

Quite often the outcomes will not be binary. To take the World Cup as an example, ahead of the quarter finals, there are eight possible outcomes for the tournament victor, corresponding to the eight remaining teams. Suppose a model gives World Cup win probabilities for each team of Brazil 25 per cent, France 20 per cent, and so on.

In this case, Brazil is favourite because 25 per cent is the highest probability. But if there’s a 25 per cent chance of something happening, there’s a 75 per cent chance of it not happening.
This means, perhaps counterintuitively, that Brazil are favourites to win the World Cup and also that Brazil probably will not win the World Cup. The two statements are not contradictory – where no team has a greater than even probability of victory, the favourite is really the least unlikely.

So when the UBS model made Germany favourites, and the predictions were in one case headlined as “Germany will win…”, this was not the case. “Will” implies certainty – the UBS model didn’t even make Germany likely winners – it merely gave them the shortest odds (a 24 per cent probability). In this case, the problem is clearly interpretation.

What about a binary outcome, such as a knockout-stage match or a referendum result? In this case, with only two possible outcomes the favourite is necessarily one with a probability of more than 50 per cent and therefore likelier to happen that not. But how likely is likely? The issue of how forecasts are expressed is perhaps the statistical equivalent of the Müller-Lyer illusion (the optical illusion where arrows of the same length appear different lengths depending on the direction of the fins).

Suppose it’s EU referendum night and two on-the-day polls have put Remain on an average of 53 per cent excluding refusers and non-voters. Based on history, we might assume that
the result has an approximately normal distribution, and 95 per cent of the time should be within plus or minus 8 percentage points of what the polls say.

You might choose to express this as a 95 per cent chance that Remain gets between 45 per cent and 61 per cent, which sounds massively hedged. Alternatively, you might say that there is a 77 per cent chance of Remain winning, which sounds like a confident prediction.

Yet these two ways of expressing a probability are actually saying the same thing. That they sound very different is largely due to the fact that not all points within the range are equally
likely (and that that is often overlooked), and also that people seem to misinterpret probabilities between about 60 and 90 per cent – in other words, likelier-than-not, but
nowhere near certain, yet often treated as though near certain.

Empirically, the probability of a penalty being scored is right in the middle of this range, at 70 to 80 per cent depending on whether or not it’s in a shootout and which competition it’s
in. This sort of probability is often treated as a very confident prediction of what might happen. But talk in terms of things that actually occur that proportion of the time, and it sounds very different. For example, you don’t need be Carlos Bacca or have looked at the data or to know that a great many penalties are in fact missed.

It’s clear, then, that a lot of the problems with forecasting can be explained by communication and reporting issues, rather than methodology. There is, of course, still a question mark over forecast performance in terms of the number of ostensibly low-probability events that have occurred recently. But that doesn’t mean an individual forecast is necessarily flawed simply because a low-probability event happens.

Forecasts that don’t provide an indication of the range of uncertainly at all, and only give mid points, such as (usually) forecasts of the temperature later this week or GDP growth for
the rest of the year, are much harder to evaluate. Without further information, it’s hard to say much beyond whether they consistently miss one way or the other.

More generally, forecasts can be improved, but the way they are communicated (both by forecasters themselves and by the media) can be improved substantially. Forecasts are not useless, as long as their limitations are properly understood. End users should as always exercise due scepticism, but not dismiss forecasting out of hand.

Either way, it’s time to put the octopus on the barbecue.

Matt Singh is the founder of Number Crunch Analytics