Public policy worldwide has a truly shocking history of ignorance about whether the benefits it claims in a precious few oft-quoted examples have occurred entirely as a result of chance or other fortuitous association, or if the policy genuinely makes the differences claimed. At the Home Office, for example, a report in early 2006 on the evidence for the effectiveness of policy to tackle reoffending found that not one policy reached the desired standard of proof of efficacy, because so many had failed to rule out the possible effects of chance when counting the rise or fall in offences by people on various schemes. That doesn't mean nothing works; it means that we don't know what works, because we haven't distinguished the man from the dog. A senior adviser at the Home Office who knows what a sneaky devil chance can be, and how easily the numbers can mislead, says to ministers who ask what actually works to prevent reoffending: 'I've no idea.'
Can this really be how government proceeds? Without much by way of statistical rigour and cursed with a blind spot for the effects of luck and chance? All too often, it is.
In the United Kingdom, that tendency to ignore the need for statistical verification is only now beginning to change, with the slow and often grudging acceptance that we need more than a plausible anecdote (a single wave) before instituting a new policy for reoffenders, for teaching methods, for healthcare, or any other state function. Politicians are among the most recalcitrant, sometimes pleading that the genuine pressures of time, expense and public expectation make impossible the ideally random-controlled trials which would be able to identify real stripes from fake, sometimes apparently not much caring or understanding, but, one way or another, often resting their policies on little more than luck and a good story, becoming as a result the willing or unwilling suckers of chance.
One more example, which we will look at in more detail in a later chapter. School exam results go up and down from year to year. They move so much, in fact, that the league table is shuffled substantially each year. But is it the school's teaching standards that are thrashing up and down? Or is the difference due to the ups and downs in pupils' ability as measured by exams from one year to the next? It seems mostly the latter, and that is as you might expect: what principally seems to determine a school's exam results is the nature of its intake. In fact, the results for a school in any one year are so subject to the luck of the intake that for between two thirds and three quarters of all schools, the noise of chance is a roar, and we are unable to hear the whisper above it of a real influence or a special cause; we are unable to say with any confidence whether there is any difference whatsoever that is made by the performance of the schools themselves. Chance so complicates the measurement that for the majority of schools the measurement is, some say, worthless. We would not go nearly that far. But we would agree that data is regularly published that misleadingly implies these waves are a reflection of the educational quality of the school itself, and report year-to-year changes in performance as if they were clear indicators of progress.
A rising tide or just a wave? The man or his dog? Stripes or a real tiger? We can be vigilant, we need to be vigilant, but we will be fooled again. The least we can do is determine not to make it easy for chance to outwit us. That task is begun by knowing what chance is capable of, a task made easier if we slow the instinctive rush to judgement and beware the tiger that isn't.
5
Averages:
The White Rainbow
Averages play two tricks: first, they put life's lumps and bumps through the blender. It might be bedlam out there but, once averaged, the world turns smooth. The average wage, average house prices, average life expectancy, the average crime rate, as well as less obvious averages like the rate of inflation … there are ups and downs mixed into them all. Averages take the whole mess of human experience, some up, some down, some here, some there, some almost off the graph, and grind the data into a single number. They flatten hills and raise hollows to tell you the height of the land – as if it were flat.
But it is not flat. Forget the variety behind every average and you risk trouble, like the man who drowned in a river that rose, he heard, on average only to his knees. So, trick one brings a problem: it stifles imagination about an awkward truth – that the world is a hotchpotch of uneven variety.
Trick two is that averages pass for typical when they may be odd. They look like everyman but can easily be no one. They stand for what's ordinary, but can be warped by what's exceptional. They sound like they're in the middle, but may be nowhere near. The way to see through an average is to try to picture the variety it blends together. Two images might help make that thought vivid: the sludgy black/brown that children make when petulant with the paint pots is a kind of average – of the colours of nature – and no one needs telling it is a deceptive summary of the view. 'White, on average,' is what we'd see by combining the light from a rainbow, then sharing it equally. But this bleeds from the original all that matters – the magical assortment of colours. Whenever you see an average, think: 'white rainbow', and imagine the vibrancy it conceals.
In his final State of the Union address, in January 2008, President George W. Bush argued for the continuation of tax cuts introduced in 2001, on behalf of a great American institution: the average taxpayer.
'Unless the Congress acts,' he said 'most of the tax relief we have delivered over the past seven years will be taken away.' He added that 116 million American taxpayers would see their taxes rise by an average of $1,800.
Which was true, approximately. The original tax-cut legislation had an expiry date. Unless renewed, the cut would end and taxes would go up. The Tax Policy Center, an independent think tank, calculated that the average tax increase would be $1,713, close enough to the President's figure of $1,800. And so the typical American citizen could be forgiven for thinking: the President means me.
Actually, no he didn't. He might like Americans to think he did, but he probably didn't. About 80 per cent of taxpayers would lose less than $1800, most of them a lot less. That is, more than 90 million of the 116 million taxpayers would not see their taxes rise by this much. To many, that feels intuitively impossible. How can so many people, so many more than half, be below average? Isn't what's average the same as what's typical?
That confusion served the President well. Being opposed to tax rises, he wanted this one to appear big to as many people as possible, so he gave the impression that the typical experience would be an $1,800 tax hike when, in truth, only one in five would pay that much.
How did he do it? He used the blender. Into the mix, he poured all taxpayers, from street sweepers to the richest yacht-collecting hedge-fund manager. Nothing wrong with that, you might say. But the richest are so rich that you can dilute them with millions of middle and low incomes and the resulting blend is still, well, rich.
Even though everyone is in it, this average is not typical. Think of the joke about four men in a bar when Bill Gates walks in. They cheer.
'Why the fuss?' asks Bill, until one of the four calms himself and answers, 'Don't you know what you've just done to our average income?'
Or think about the fact that almost everyone has more than the average number of feet. This is because a few people have just one foot, or no feet at all, and so the tiny influence of a tiny minority is nevertheless powerful enough to shift the whole average to something a bit less than two. To have the full complement of two feet is therefore to be above average. Never neglect what goes into an average, and why the influence of a single factor, one part of the mix, might move the whole average somewhere surprising and potentially misleading.
The chart opposite divides taxpayers into groups, 20 per cent in each, and shows the increased tax liability for each group. The tax rise for the middle 20 per cent would average not $1,800, but about $800. The bottom fifth would pay an extra $41 on average. We can see that four out of the five groups, accounting for about 80 per cent of taxpayers, would pay less than the average. We can also see what lifts the average – the tall col
umn representing the income of the richest 20 per cent.
Figure 6 Who pays what?
If we looked at this last group in more detail, we would find that the top 0.1 per cent would pay an extra $323,621 each. These are income centipedes, millipedes even, who are so rich, paying so much tax, that they move the average far, far more than other individuals.
This is not an argument about the merit of tax cuts in general or about how they were implemented by President Bush. It is about what happens when they are presented using an average. The President did the same when the tax cut was introduced, selling it on the basis of the average benefit for all taxpaying Americans, when the benefit was massively skewed to those who paid most tax. Once again, you might agree with that emphasis, but it is a bias that the average conceals.
Averages are like that: in trying to tell us something about an entire group, they can obscure what matters about its parts. This applies not only to economics, where averages are often cited, but to almost any description of typical experience, to take one surprising example – pregnancy.
Victoria Lacey was pregnant, and overdue, in early September 2005. Her due date had been 26 August. Two weeks late and pregnancy now maddening, she began each day with hope this would be it, and then, as the long hours passed, resigned herself to another. Was something wrong? 'Why can't your body produce a baby on the date it's supposed to?' she asked herself.
But which date is that? Doctors give expectant mothers an estimated date since, naturally, they can never be certain, and that estimate is based on the average length of a pregnancy. But how long is the average pregnancy? The answer, unhelpfully, is that the official average pregnancy is shorter than it probably ought to be.
There were 645,835 live births recorded in the UK in 2005; is it possible that every due date was misleading? Of course, some will have been right by chance simply because pregnancies vary in duration, but will they have been right as often as they could have been? The impression of an imprecise science is confirmed when we learn that the practice in France is to give a latest date, not a due date, some ten days later than in Britain. Victoria gave birth to baby Sasha, safely and without inducement, about two weeks overdue, on 10 September 2005.
Due dates in the UK are initially calculated by counting 280 days from the first day of the last menstrual period. British doctors settled on this number in part because it seemed about right, but also under the influence of a Dutch professor of medicine named Herman Boerhaave ('So loudly celebrated, and so universally lamented through the whole learned world' – Samuel Johnson).
Boerhaave wrote nearly 300 years ago that the duration of pregnancy was well known from a number of studies. Those studies have not survived, though their conclusion has. It remains well known up to the present day, consolidated by influential teachers and achieving consistency in medical textbooks by about the middle of the twentieth century. Some are also familiar with Naegele's Rule, based on the writings of Franz Naegele in 1812, who said that pregnancy lasted ten lunar months from the last menstrual period, also giving us 280 days. Nearly everyone in the UK still agrees that 280 days is correct: it is the average.
But averages can deceive. A drunk sways down the street like a pendulum from one pavement to the other, positioned on average in the centre of the road between the two white lines as the traffic whistles safely past, just. On average, he stays alive. In fact, he walks under a bus.
Averages put variation out of mind, but somewhere in the average depth of the river there might be a fatally deep variation; somewhere in the distribution of the drunk's positions on the road there was a point of collision that the average obscures. The enormous value of the average in contracting an unwieldy bulk of information to make it manageable is the very reason why it can be so misleading.
To use another metaphor, the world is a soup of sometimes wildly varying ingredients. The average is like the taste and tells us how the ingredients combine. That is important, but never forget that some ingredients have more flavour than others and that these may disguise what else went into the pot.
If we are to avoid coming away with the idea that the average English vegetable tastes of garlic, we also need to know the whole recipe. Averages stir traces of the richer, weirder world into one vast pot with everything else, mix the flavours and turn the whole into something which may or may not be true of most people, and may be true of none.
If you find that thought hard to apply to numbers, simply remember that any average may contain strong flavours too: distorting numbers, atypical numbers. Think again of the influence of the small number of the one-footed on the average number of feet for all.
So what is going on in the rich variety of experiences of pregnancy? In particular, what happens at the edges? Two facts about pregnancy suggest that the simple average will be misleading. First, some mothers give birth prematurely. Second, almost no one is allowed to go more than two weeks beyond the due date before being induced. Premature births pull the average down; late ones would push it up, but we physically intervene to stop babies being more than two weeks late. The effect of this imbalance – we count very early births but prevent the very late ones – is to produce a lower average than if nature were left to its own devices. That is not a plea to allow pregnancies to continue indefinitely, just to offer a glimpse inside the calculation.
We might argue in any case that very premature births ought not to be any part of the calculation of what is most likely. Most births are not significantly premature, so if a doctor said to a woman: 'The date I give you has been nudged down a few days to take account of the fact that some births – though probably not yours – will be premature,' she might rightly answer, 'I don't want my date adjusted for something that probably won't happen to me, I want the most likely date for what probably will'. The average is created in part by current medical practice – the numbers we've got arise partly through medical intervention – and the argument in favour of 280 days becomes circular: this is what we do because it is based on the average, and this becomes the average partly because of what we do.
Mixing in the results from the edges produces a duration that is less likely to be accurate than it could be. In the largest recent study, of more than 400,000 women in Sweden, most had not yet given birth by 280 days – the majority of pregnancies lasted longer. By about 282 days, half of babies had been born (the median), but the single most common delivery date (the mode), and thus arguably the most likely for any individual in this distribution, was 283 days.
If most women have not had their baby until they are at least two days overdue, and women are more likely to be three days overdue than anything else, it invites an obvious question: are they really overdue?
You are forgiven for finding it muddled. Yet above all this stands the unarguable fact, confirmed by all recent studies, that the duration at which more women have their baby than any other is 283 days. This type of average – the most common or popular result – is known as the mode, and it's not clear why in this case it isn't preferred. In fact, that latest study from Sweden found that even the simple arithmetical average (the mean) was not in fact 280 days, but 281.
None of this would matter much (one, two or three days is sometimes frustrating, but in a normal pregnancy most likely to be medically neither here nor there at this stage), were it not that these numbers form part of the calculation of when to induce birth artificially. Induction is often offered to women in this country, sometimes with encouragement, seven days after the due date. It raises significantly the likelihood of Caesarean section – which has risks of its own – and can be, for some, a deeply disappointing end to pregnancy.
When obstetricians answer that induced birth has better outcomes for women than leaving them a little longer, they fail to tell you that one of the ways they measure that 'better outcome' is by asking women if they feel better, having sorted out the problem of being overdue. If you tell someone there's a problem, they may well thank you for solving it. If they
knew the problem was based on a miscalculation, they might feel otherwise.
Averages bundle everything together. That is what makes them useful, and sometimes deceptive. Knowing this, it is simple to avoid the worst pitfalls. All you need do is remember to ask: 'What interesting flavours might be lost in that blend? What else could be in the pot? And if they tell me the rainbow is white on average: what colours are blanked out?
So we need to tread warily when offered an average as a summary of disparate things. It would help if journalists, politicians and others took care to avoid using them as a way of saying 'just about in the middle' (unless this is an average called the median), avoided using them to stand for what is 'ordinary', 'normal' or 'reasonable', avoided using them even to mean 'most people', unless they are sure that is what they meant. They may be none of those things.
The 'middle' is a slippery place altogether. Middle America, like Middle England, is a phrase beloved of politicians and journalists. Both are bundled up with social and moral values and the economic plight said to be typical of decent citizens, who are neither rich, nor poor, but hard-working, probably families and, well, sort of, customarily, vaguely, in the middle. It is a target constituency for candidates of all political parties, given extra appeal by being blended with an idea of the middle class, a group-membership now claimed by the vast majority in the UK. A survey by the American Tax Foundation also found that four out of five Americans label themselves 'middle class'. Just two per cent call themselves 'upper class'.
In sum, the 'middle' has become impossibly crowded. Politicians particularly like it that way, so that any proposal said to benefit even some small part of the middle-class/middle-England/average-citizen can leave as near as possible the entire population feeling warm and beloved, at the heart of a candidate's concern. By this standard, we are all in the middle now.
The Tiger That Isn't Page 7