The Tiger That Isn't

Home > Other > The Tiger That Isn't > Page 13
The Tiger That Isn't Page 13

by Andrew Dilnot

That does not make it a waste of time. Flawed as the data is, it is usually better than no data at all; inadequate it may be, but improvement is not always easy without huge expense. The key point is to realise these numbers' implicit uncertainty and treat them with the care – not cynicism – they deserve. Few people realise how much of our public data comes from samples. A glance at National Statistics, the home page for data about anything to do with the economy, population or society, reveals a mass of information, nearly all based on samples, and only one of its featured statistics based on a full count:

  This year's official babies' names figures show that it's all change for the girls with Olivia and Grace moving up to join Jessica in the top three. Jack, Thomas and Joshua continue to be the three most popular boy's names, following the trend of previous years.

  Names are fully recorded and counted; so too is the approximate figure for government spending and receipts. But on a day taken at random, the day of writing, everything else listed there – and there are a dozen of the most basic economic and social statistics – is based on a sample. The size of that sample for many routine figures, inflation or the effect of changes in tax or benefits, for example, is about 7,500 households, or about 1/3,000th of the actual number of households in the UK.

  This is inevitable. It could take a lifetime to number nothing more than one life's events, so much is there to count, so little that is practicably countable. Expensive, inconvenient, overwhelming, the effort to quantify even the important facts would be futile, were it not for sampling. But sampling has implicit dangers, and we need to know both how widespread it is as a source of everyday numbers, and how it goes wrong.

  The National Hedgehog Survey began in 2001. Less cooperative even than the Census-shy public, hedgehogs keep their heads down. In the wild there is no reliable way to count them short of rounding them up, dead or alive, or ripping through habitats, ringing legs – which would somewhat miss the point.

  It is said hedgehogs are in decline. How do we know? We could ask gamekeepers how many they have seen lately – up or down on last year – and we could put the same questions in general opinion surveys. But the answers would be impressionistic, anecdotal, and we want more objective data than that. What do we do?

  In 2002 the Hedgehog Survey was broadened to become the Mammals on Roads Survey. Its name suggests, as young hedgehog-lovers might say, an icky solution. The survey is in June, July and August each year, when mammals are on the move and when, to take the measure of their living population, we count how many are silhouetted on the tarmac. The more hedgehogs there are in the wild, so the logic goes, the more will snuffle to their doom on the bypass. In a small hog population this will be rare, in a large one more common.

  But spot the flaw in the method: does it count hedgehogs or traffic density? Even if the hedgehog population was stable, more cars would produce more squashes. Or, as the more ingenious listeners to More or Less suggested, does the decline in road-kill chronicle the evolution of a smarter, traffic-savvy hedgehog which, instead of rolling into a ball at sight or sound of hazard, now legs it, and lives to snuffle another day beyond the ken of our survey teams? Or maybe climate change has led to alterations in the hedgehog life cycle through the year, which reduces the chance of being on the roads in the three monitored months.

  The most recent Mammals on Roads Survey tells us that average counts in both England and Wales have fallen each year of the survey, and the numbers in Scotland last year are markedly lower than those at the start of the survey. In England, the decline is biggest in the Eastern and East Midlands region and in the South West; no one knows why.

  The lesson of the survey is that we often go to bizarre but still unsatisfactory lengths to gather enough information for a reasonable answer to our questions, but that it will remain vulnerable to misinterpretation, as well as to being founded on a poor sample. Yet, for all the potential flaws, this really is the best we can do in the circumstances. Most of the time we don't stop to think, 'How did they produce that number?' We have been spoiled by the ready availability of data into thinking it is easily gathered. It is anything but. Do not assume there is an obvious method bringing us a sure answer. We rarely know the whole answer, so we look for a way of knowing part, and then trust to deduction and guesswork. We take a small sample of a much larger whole, gushing with potential data, and hope: we try to drink from a fire hose.

  In fact, drinking from a fire hose is relatively easy compared with statistical sampling. The problem is how you can be sure that the little you are able to swallow is like the rest. If it's all water, fine, but what if every drop is in some way different?

  Statistical sampling struggles heroically to swallow an accurate representation of the whole, and often fails. What if the sample of flattened hedgehogs comes entirely from a doomed minority, out-evolved and now massively outnumbered by runaway hedgehogs? If so, we would have a biased count – of rollers (in decline) instead of runners (thriving). Unlikely, no doubt, and completely hypothetical, but how would we know for sure?

  But such systematic bias in sampling is not only theoretical. It applies, for example, to a number that is capable from time to time of dominating the headlines: the growth of the UK economy.

  The Bank of England, the Treasury, politicians, the whole business community and the leagues of economic analysts and commentators generally accept the authority of the figures for UK economic growth, compiled in good faith with rigorous and scrupulous determination by National Statistics. It is a figure that has governments trembling and is the bedrock of every economic forecast, the measure of our success as an economy, the record of rising prosperity or recession.

  It is also based on a sample. What's more, there is good evidence the sample is systematically biased in the UK against those very parts of the economy likely to grow fastest. One consequence has been that for the last ten years we have believed ourselves under-performing compared with the United States, when in fact we might have been doing better, not worse.

  In the UK, it is hard to spot the growth of new start-up businesses until we have seen their tax returns. That is long after the event, perhaps two years after the first estimates of GDP growth are published, before the growth of new businesses has been incorporated in the official figures. So the initial GDP figures fail to count the one area of the economy that is plausibly growing fastest – the new firms with the new ideas, creating new markets. On past form this has led frequently to initial under-reporting of economic growth in the UK by about half a percentage point. When growth moves along at about 2.5 per cent a year, that is a big error.

  But it is neither perverse, nor incompetent, and arguably unavoidable. One alternative, in the short-term absence of hard data, is to guess. How fast do we think new parts of the economy might be growing? And having watched it happen for a while, we might reckon that half a per cent of GDP would be a decent stab at the right number, but it would essentially be a guess that past performance would continue unchanged, a guess with risks, as any investor knows. The view of Britain's National Statistician is that it is right to count what we can reasonably know, not what we might speculate about, but this does mean an undercount is likely (though not inevitable).

  Guessing is what they do in the United States – and then tend to have to revise their estimates at a later date when real data is available, usually down, often significantly. By the time a more accurate figure comes out, of course, it is too late to make much impression on public perception because everyone has already gone away thinking the US a hare and the UK a sloth. Not on this data, it's not. Our opinions have been shaped by our sampling, and our sampling is consistently biased in a way that is hard to remedy accurately until later, by which time no one is interested any more in the figures for events up to two years old. In 2004 we thought UK growth was 2.2 per cent (not bad, but not great) compared to US growth of 3.2 per cent (impressive). After later revisions, UK growth turned out to be 2.7 per cent, and US growth also 2.7 per cent. Figur
es for the pace at which the US economy grew in the last three months of 2006, were revised down from an initial estimate of 3.5 per cent to 2. 2 per cent by March 2007, a massive correction.

  But what is the alternative? One would be to measure the whole lot, count every bit of business activity, at the moment of each transaction. Well, we could, just about, if we were willing to pay for and endure so much statistical bureaucracy. We already measure some of it this way, but in practice some dainty picking and choosing is inevitable.

  Life as a fire hose, and samplers with tea cups and crooked fingers, is an unequal statistical fight. In truth, it is amazing we capture as much as we do, as accurately as we do. But we do best of all if we also recognise the limitations, and learn to spot occasions when our sample is most likely to miss something.

  HIV/Aids cases are a global emergency, fundamentally impossible to count fully. The number of cases worldwide was estimated by UNAids (the international agency with responsibility for the figures – and for coordination of the effort to tackle them) at 40 million people in 2001. The number is believed to have risen ever since and now stands (according to UNAids' 2006 report) at 33 million. That's right, there is no mistake, the 'increase' is from 40 to 33 million: the rising trend has gone down, though there are wide margins of uncertainty around these figures.

  The explanation for this paradox is in the samples. Researchers conceded that their sampling (much of it in urban maternity clinics) had been biased. Pregnant women turn out to be a poor reflection of the prevalence of a sexually transmitted disease in the rest of the population because – guess what? – pregnant women have all had unprotected sex. A second problem is that urban areas might well have had higher rates of infection than rural areas.

  At least this sample was easy to collect. When the job is hard and a full count impossible, convenience matters. But it's a strangely warped mirror to the world that implies everyone is having sex, and unprotected sex at that. The new, improved methodology incorporates data from population-wide surveys, where available.

  UNAids believes its earlier estimates were too high, and has revised them in the light of other surveys. There never were, it now thinks, as many cases as there are now. So can we trust its new sampling methodology? The figures are contested, in both directions, some believing them too low, others still too high, and arguments will always rage. All we can do is keep exercising our imagination for ways in which the sample might be a warped mirror of reality.

  It is worth saying in passing that UNAids thinks the problem has peaked in most places, adding that one growing contribution to the numbers is a higher survival rate. Not all increases in the numbers suffering from an illness are unwelcome: sometimes they represent people who in the past would have died but are now alive, and that is why there are more of them.

  Still, doubts about accuracy or what the numbers represent do not discredit the conclusion that we face a humanitarian disaster: 2 million people are believed by UNAids to have died of HIV/Aids last year, and it is thought there were 4 million new cases worldwide. In some places, earlier progress in reducing the figures seems to be in reverse. These figures, though certainly still wrong, one way or another, would have to be very wrong indeed to be anything less than horrifying.

  Reports of the figures tend to pick out a single number. But sampling is often so tricky that we put large margins of uncertainty around its results. The latest UN figures do not state that 38 million is the number; they say that the right number is probably somewhere between about 30 million and about 48 million, with the most likely number somewhere near the middle. One of these figures, note, is 60 per cent higher than the other. This is a range of uncertainty that news reporting usually judges unimportant: one figure will do, it tends to suggest. Where sampling gives rise to such uncertainty, one figure will seldom do.

  Currently more controversial even than the figures for HIV/ Aids is the Iraq war. Iraq has a population less than half the size of Britain's. In October 2006 it was estimated in research by a team from Johns Hopkins University, published in the Lancet, that nearly twice as many people had died as a result of the Anglo-American invasion/liberation of Iraq as Britain had lost in the Second World War, around 650,000 in Iraq to about 350,000 British war dead, civilian and combatants combined; that is, nearly twice as many dead in a country with a population (about 27 million), not much more than half of Britain's in 1940 (48 million).

  Of those 650,000 deaths, about 600,000 were thought to be directly due to violence, a figure which was the central estimate in a range from about 400,000 to about 800,000. The comparison with the Second World War shows that these are all extremely big numbers. The political impact of the estimate matched the numbers for size, and its accuracy was fiercely contested.

  It was, of course, based on a sample. Two survey teams visited fifty randomly selected sites of about forty households each, a total of 1,849 households with an average of about seven members (nearly 13,000 people). One person in each household was asked about deaths in the fourteen months prior to the invasion and in the period after. They asked in about 90 per cent of cases of reported deaths to see death certificates, and were usually obliged.

  The numbers were much higher than those of the Iraq Body Count (which had recorded about 60,000 at that time), an organisation that uses two separate media reports of a war death before adding to its tally (this is a genuine count, not a sample), and is scrupulous in trying to put names to numbers. But because it is a passive count, it is highly likely to be (and is, by the admission of those who do it) an undercount.

  But was it likely to have been such a severe undercount that it captured only about 10 per cent of the true figure? Attention turned to the sampling methodology of the bigger number. Because it was a sample, each death recorded had to be multiplied by roughly 2,200 to derive a figure for the whole of Iraq.

  So if the sample was biased in any way – if, to take the extreme example, there had been a bloody feud in one tiny, isolated and violent neighbourhood in Iraq, causing 300 deaths, and if all these deaths had been in one of the areas surveyed, but there had been not a single death anywhere else in Iraq, the survey would still produce a final figure of 650,000, when the true figure was 300.

  Of course, the sample was nothing like as lopsided in this way, to this degree, but was it lopsided in some other? Did it, as critics suggested, err in sampling too many houses close to the main streets where the bombs and killings were more common, not enough in quieter, rural areas? Was there any manner in which the survey managed the equivalent of counting Dresden but missing rural Bavaria in a sample of Germany's war dead?

  In our view, if the Iraq survey produced a misleading number (it is bound to be 'wrong' in the sense of not being precisely right, the greater fault is to be 'misleading'), it is more likely because of the kind of problem discussed in the next chapter – to do with data quality – than a bad sample. What statisticians call the 'design' of the sample was in no obvious way stupid or badly flawed. But it is perfectly proper to probe that design, or indeed any sample, for weakness or bias.

  To do this well, what's needed above all is imagination; that and enough patience to look at the detail. What kind of bias might have crept in? What are the peculiarities of people that mean the few we happen to count will be unrepresentative of the rest?

  Even data that seems to describe us personally is prone to bias. You know your own baby, surely. But do you know if the little one is growing properly? The simple but annoying answer is that it depends what 'properly' means. In the UK, this is defined by a booklet showing charts that plot the baby's height, weight and head circumference, plus a health visitor sometimes overly insistent about the proper place on the chart for your child to be. The fact that there is variation in all children – some tall, some short, some heavy, some light – is small consolation to the fretful parent of many a child below the central line on the graph, the 50th percentile. In itself, such a worry is usually groundless, even if it is encoura
ged. All the 50th percentile means is that half of babies will grow faster and half slower. It is not a target, or worse, an exam that children pass or fail.

  But there is a further problem. Who says this is how fast babies grow? On what evidence? The evidence, naturally, of a sample. Who is in that sample? A variety of babies past that were supposed to represent the full range of human experience. Is there anything wrong with that?

  Yes, as it happens. According to the World Health organisation, not every kind of baby should have a place in the sample. The WHO wants babies breast-fed and says the bottle-fed should be excluded. This matters because bottle-fed babies tend to grow a little more quickly than breast-fed babies, to the extent that after two years the breast-fed baby born on the 50th percentile line will have fallen nearly to the bottom 25 per cent of the current charts.

  If the charts were revised, as the WHO wants, those formerly average-weight, bottle-fed babies would now move above the 50th percentile line, and begin to look a little on the heavy side, and the breast-fed babies would take their place in the middle.

  So the WHO thinks the charts should set a norm – asserting the value of one feeding routine over another, and has picked a sample accordingly. This is a justifiable bias, it says, against bad practice, but a bias nevertheless, taking the chart away from description, and towards prescription simply by changing the sample.

  Depending what is in a sample, all manner of things can change what it appears to tell us. Did that survey somehow pick up more people who were older, younger, married, unemployed, taller, richer, fatter; were they more or less likely to be smokers, car drivers, female, parents, left-wing, religious, sporty, paranoid … than the population as a whole, or any other of the million and one characteristics that distinguish us, and might just make a difference?

 

‹ Prev