The Tiger That Isn't Page 12 Read online free by Andrew Dilnot

Home > Other > The Tiger That Isn't > Page 12

The Tiger That Isn't Page 12

So it is not that uncertainty means absolute ignorance, nor that the numbers offer certainty, rather that they can narrow the scope of our ignorance. This partial foretelling of fate is an extraordinary achievement. But we need to keep it in proportion, and we certainly need to get the right way round the degree of risk, whether likely or not. The overwhelming evidence is that we are more likely to judge this correctly if we use natural frequencies and count people, as people do, rather than use percentages.

What generally matters is not whether a number is right or wrong (they are often wrong), but whether numbers are so wrong as to be misleading. It is standard practice among statisticians to say how wrong they think their numbers might be, though we might not even know in which direction – whether too high or too low. Putting an estimate on the potential size of the error, which is customarily done by saying how big the range of estimates needs to be before we can be 95 per cent sure it covers the right answer (known as a confidence interval), is the best we can do by way of practical precaution against the number being bad. Though even with a confidence interval of 95 per cent there is still a 5 per cent chance of being wrong. This is a kind of modesty the media often ignore. The news often doesn't have time, or think it important, to tell you that there was a wide range of plausible estimates and that this was just one, from somewhere near the middle. So we don't know, half the time, whether the number is the kind to miss a barn door, enjoying no one's confidence, or if it is a number strongly believed to hit the mark.

We accuse statisticians of being overly reductive and turning the world into numbers, but statisticians know well enough how approximate and fallible their numbers are. It is the rest of us who perform the worst reductionism whenever we pretend the numbers give us excessive certainty. Any journalist who acts as if the range of uncertainty does not matter, and reports only one number in place of a spread of doubt, conspires in a foolish delusion for which no self-respecting statistician would ever fall. Statistics is an exercise in coping with, and trying to make sense of, uncertainty, not in producing certainty. It is usually frank in admission of its doubt and we should be more willing to do the same.

If ever you find yourself asking, as you contemplate a number, 'How can they be so precise?' the answer is that they probably can't, and probably weren't, but the reporting swept doubt under the carpet in the interests of brevity. If, somewhere along the line, the uncertainty has dropped out of a report, it will probably pay to find out what it had to say.

If we accept that numbers are not fortune tellers and will never tell us everything, but can tell us something, then they retain an astonishing power to put a probability on our fate. The presentation of the numbers might have left a lot to be desired, but the very fact that we can know – approximately – what effect drinking regularly will have on the chance of breast cancer, is remarkable. Picking out the effect of alcohol from all the other lifetime influences on health is a prodigious undertaking, and the medical surveys that make it possible are massive data-crunching exercises. Having gone to all that effort, it is a scandal not to put it to proper use and hear clearly what the numbers have, even if with modesty, to say.

8

Sampling: Drinking from a

Fire Hose

Counting is often bluff. It is, in truth, counting-lite. Many of the hundreds of numbers printed and broadcast every day have routinely, necessarily, skimped on the job.

To know what they are worth, you need to know how they are gathered. But few know the shortcuts taken to produce even respected numbers: the size of the economy or trade, the profit companies make, how much travel and tourism there is, UK productivity, the rate of inflation, the level of employment … as well as controversial numbers like Iraq war dead, HIV/Aids cases, migrants, and more.

Their ups and downs are the bread and butter of news, but none is a proper tally. Instead, only a few of each are counted, assuming they are representative of the rest, then multiplied to the right size for the whole country.

This is the sample, the essence of a million statistics, like the poet's drop of water containing an image of the world in miniature – we hope. But the few that are counted must mirror the others or the whole endeavour fails; so which few? Choose badly and the sample is skewed, the mirror flawed, and for a great many of the basic facts about us, our country and our economy, only error is multiplied.

There were stark warnings – don't touch the door handles, don't shake hands, don't go out, scrub and scrub again – and lurid images: of sickness, deserted workplaces, closed hospital wards. And the numbers were huge.

The British media was in the grip of an epidemic, its pages covered in vomit, at least reports of it. The condition was first sighted in the Daily Telegraph, so virulent that within days every other newspaper and broadcaster had succumbed, sick with the same vile imaginings.

In these dire circumstances, only one cure is possible: sufferers are strongly advised to check the sources of their data. Strangely, none did, so the small matter of whether the public itself faced a real epidemic became almost irrelevant.

It was the winter of 2007–8 and the sickness was norovirus, also known as winter flu or winter vomiting disease, and it seemed that a shocking number of people had fallen victim. The Daily Telegraph said 3 million. The Daily Express soon said 3.5 million. The Sun said 4 million. From those bold numbers, you might imagine that there were more officials with clip-boards, this time stationed outside every bathroom window in the land, recording how many of us were throwing up.

Clearly, the number of cases is not counted in any proper sense of the word. Only a tiny proportion of those sick with norovirus go to the doctor. Fewer cases still are confirmed by a lab test. Norovirus passes (there is no cure) in a couple of days. In truth, no one could know (nor did) how many were affected, but arrived at alarming totals on the basis of a sample from which they simply extrapolated.

Samples have to be large enough to be plausibly representative of the rest of the population. So no one, surely, would extrapolate from a sample of one, for example. The sample on this occasion – the only data we have about the incidence of norovirus, in fact – was the 2,000 cases occurring in October, November and December of 2007 that had been confirmed in the laboratory. From 2,000 confirmed cases to 3 or 4 million is a big leap, but, for every recorded case, the media reported that there were 1,500 more in the community, a figure obtained from the Health Protection Agency (HPA). This made the arithmetic apparently straightforward: 2,000 confirmed cases × 1,500 = 3 million.

But the HPA also urged caution, saying the ratio of confirmed cases to the total should not be taken literally, and might be changing anyway as, for example, diagnostic technology became more sensitive. People's readiness to go the doctor with this illness might also have been changing. So the ratio of 1:1500 is unreliable from the start; how unreliable, we'll now find out.

It originated in what is known as the IID study (Infectious Intestinal Diseases) conducted between 1993 and 1996. Such studies are normally careful about the claims they make. Researchers recognise that there is a good deal of uncertainty around the numbers they find in any one community, and that these may vary from place to place and time to time. They put around the numbers the confidence intervals described in the last chapter. As the name suggests, these aim to give some sense whether this is the kind of number that has their confidence, or the kind, as we say, that they wouldn't trust to hit a barn door. A rough rule of thumb is that wide confidence intervals indicate that the true figure is more uncertain, narrow confidence intervals suggest more reliability.

The estimate, remember, was that every case recorded equalled 'about' 1,500 in total. So how 'about' is 'about'? How wide were the confidence intervals for the norovirus? To be 95 per cent sure of having got the number right, the confidence intervals in the IID study said that one lab case might be equal to as few as 140 cases in the community… or as many as 17,000. That is some uncertainty.

These numbers impl
y that for the 2,000 cases confirmed in the laboratory in winter 2007, the true number in the community could be anywhere between 280,000 and 34 million (more than half the entire population of the UK), with a 5 per cent chance, according to this research, that the true value lies outside even these astonishingly wide estimates. As the authors of the study said when reporting their findings in the restrained language of the British Medical Journal: 'There was considerable statistical uncertainty in this ratio.' Let's put it more bluntly: they hadn't the foggiest idea. And for good reason: the number of laboratory cases in that study, the number of people whose norovirus had been confirmed, and thus the sample from which the 1:1500 ratio was calculated, was in fact … 1.

This was a 'wouldn't-hit-a-barn-door-number' if ever there was one. What's more, three quarters of the 2,000 recorded cases in the winter of 2007 'epidemic' were from patients already in hospital wards, where the illness is known to spread quickly in the contained surroundings of a relatively settled group of people. That is, these cases, the majority, might not have been representative of more than a handful of others in the outside world, if any. If ten patients in a ward all come down with the bug, does that mean there are 15,000 more new cases out there in the community? It may mean there are the ten in the ward and that's it.

By mid January 2008, all talk of an epidemic had disappeared, as the apparent rise in cases evaporated (2,000 in the last three months of 2007 had been twice the figure for the corresponding period for the previous year, but there weren't really twice as many recorded cases, the peak had just arrived earlier – and also passed earlier – making the seasonal total much like any other). None of this makes much difference to the more fundamental problem – that we have only the vaguest idea how the sample relates to the rest of the population, and none at all about whether that relationship is changing, and are therefore unable to say – without some other source of overwhelming evidence – whether there was a comparative torrent of sickness, or a trickle.

Vomit probably does not count as one, but some of the most hotly debated subjects rely on samples. For example, immigration statistics in Britain, in summer 2007, became a subject of ridicule, repeatedly revised, always up. Press, politicians and presumably some members of the public were scandalised that the statistics were a) inaccurate and b) not actually counted, but sampled.

Neither should be surprising. To see why, follow us for a brief excursion to and fro across the English Channel, where the numbers for sea crossings begin with a huddle of officials gathered at Dover docks in the cold early morning. Their job is to count, and much is at stake: what they jot on clipboards will lead others to bold conclusions, some saying we need more workers if Britain is to prosper, some that this imperils the British way of life. They count, or rather sample, migrants.

All the public normally hears and knows of the statistical bureaucracy at our borders is the single number that hits the headlines: in 2005 net immigration of about 180,000, or about 500 a day. This is sometimes enriched with a little detail about where they came from, how old they are, whether single or with children, and so on.

Officials from the International Passenger Survey, on the other hand, experience a daily encounter with compromising reality in which the supposedly simple process of counting people to produce those numbers is seen for what it is: mushiness personified. Here they know well the softness of people, people on the move even more so. What's more, they sample but a tiny fraction of the total.

The migration number for sea crossings begins when, in matching blue blazers on a dismal grey day, survey teams flit across the Channel weaving between passengers on the ferries, from the duty-free counter to the truckers' showers, trying in all modesty to map the immense complexities of how and why people cross borders. The problem is that to know whether passengers are coming or going for good, on a holiday or a booze cruise, or a gap year starting in Calais and aiming for Rio, there's little alternative to asking them directly. And so the tides of people swilling about the world, seeking new lives and fleeing old, heading for work, marriage or retirement in the sun, whether 'swamping' Britain or 'brain-draining' from it, however you interpret the figures, are captured for the record if they travel by sea when skulking by slot machines, half-way though a croissant, or off to the ladies' loo.

'Oi! You in the chinos! Yes, by the lifeboat! Where are you going?' And so they discover the shifting sands of human migration and ambition.

Or maybe not, not least because it is a lot harder than that. To begin with, the members of the International Passenger Survey teams are powerless – no one has to answer their questions – and, of course, they are impeccably polite; no job this for the rude or impatient. Next, they cannot count and question everyone, there is not enough time, so they take a sample which, to avoid the risk of picking twenty people drinking their way through the same stag weekend and concluding the whole shipload is doing the same, has to be as random as possible.

So, shortly before departure, they stand at the top of the eight flights of stairs and various lifts onto the passenger deck as the passengers come aboard, scribbling a description of every tenth in the file; of the rucksacked, the refugees, the suited or the carefree, hoping to pick them out later for a gentle interrogation: tall-bloke, beard, 'surfers do it standing up' T-shirt. That one is easy enough, not much likelihood of a change of clothes either, which sometimes puts a spanner in the works.

When several hundred Boy Scouts came aboard en route to the World Scouting Jamboree, the survey team assumed, with relief, that the way to tell them apart would be by the colour of their woggles, until it turned out that the Jamboree had issued everyone with new, identical ones. The fact that they were all going to the same place and all coming back again did not absolve the survey teams of the obligation to ask. The risks of woggle fatigue are an occupational hazard of all kinds of counting.

The ferry heaves into its journey and, equipped with their passenger vignettes, the survey team members also set off, like Attenboroughs in the undergrowth, to track down their prey, and hope they all speak English.

'I'm looking for a large lady with a red paisley skirt and blue scarf. I think that's her in the bar ordering a gin and lime.'

She is spotted – and with dignified haste the quarry is cornered: 'Excuse me, I'm conducting the International Passenger Survey. Would you be willing to answer a few questions?'

'Of course.' Or perhaps, with less courtesy: 'Nah! Too busy, luv.' And with that the emigration of an eminent city financier is missed, possibly. About 7 per cent refuse to answer. Some report their honest intention to come or go for good, then change their minds, and flee the weather, or the food, after three months.

The International Passenger Survey teams interview around 300,000 people a year, on boats and at airports. In 2005 about six or seven hundred of these were migrants, a tiny proportion from which to estimate a total flow in both directions of hundreds of thousands of people (though recently they began supplementing the routine survey data with additional surveys at Heathrow and Gatwick designed particularly to identify migrants). The system has been described by Mervyn King, Governor of the Bank of England – who is in charge of setting interest rates and has good reason for wanting to know the size of the workforce – as hopelessly inadequate.

In airports too, the sample has been almost comically awry. In November 2006, Mervyn King's evidence to the Treasury Select Committee was that even as Eastern European migration reached its peak, estimates had been surprisingly low: 'We do not have any particularly accurate method of measuring migration, neither gross or net,' he said.

'In 2003, I think there were 516,000 passenger journeys between the UK and Poland. That is both in and out. Almost all of them (505,000) – were to Gatwick, Heathrow or Manchester. Over the next two years the number of passenger journeys between the UK and Poland went from 516,000 to about 1.8 million. Almost all of that increase was to airports other than Heathrow, Gatwick, and Manchester. Why does this matter? Because most
of the people handing out the questionnaires for the IPS were at Heathrow, Gatwick and Manchester.' Mr King added that outside Heathrow, Gatwick and Manchester, the number of airline passengers in 2005 'who actually said in the International Passenger Survey, “Yes, I am a migrant coming into the UK”, was 79'.

It's an irresistible image: the International Passenger Survey experiencing a protracted Monsieur Hulot moment as enumerators look one way while a million people tip-toe behind them. Whether they should have foreseen this change in the pattern of travel is still arguable, but it illustrates perfectly how new trends can leave old samples looking unfortunate.

This is counting in the real world. It is not a science, it is not precise, in some ways it is almost, with respect to those who do it, absurd. Get down to the grit of the way data is gathered, and you often find something slightly disturbing: human mess and muddle, luck and judgement, and always a margin of error in gathering only a small slice of the true total, from which hopeful sample we simply extrapolate. No matter how conscientious the counters, the counted have a habit of being downright inconvenient, in almost every important area of life. The assumption that the things we have counted are correctly counted is rarely true, cannot, in fact, often be true, and as a result our grip on the world through numbers is far feebler than we like to think.

‹ Prev Next ›