Why did this happen? In essence for one reason: over-confidence about how easy it is to count. There is much in life that is only sort-of true, but numbers don't easily acknowledge sort-ofs. They are fixed and uncompromising, or at least are used that way. Never lose sight of the coarse compromise we make with life when trying to count it.
Perilous as comparison is within a single country, it pales besides international comparison. With definitions across frontiers, we really enter the swamp. Not that we would know it, from the way they are reported.
For a glimpse of what goes wrong, let us say we're comparing levels of sporting prowess; that's one thing, isn't it? And let us agree that scoring a century in county cricket shows sporting prowess. And so let us conclude that Zinedine Zidane, three times voted world footballer of the year, having failed ever to score a century in county cricket, is rubbish at sport. The form of the argument is absurd, and also routine in international comparison.
Whose health system is better, whose education? Who has the best governance, the fewest prison escapes? Each time things are measured and compared on the same scale, it is insisted that in an important way they are the same thing; they have a health system, we have a health system, theirs is worse. They teach maths, we teach maths, but look how much better their results are. They have prisons, we have prisons, and on it goes.
Visiting Finland, Christopher Pollitt of Erasmus University in the Netherlands was surprised to discover that official records showed a category of prisons where no one ever escaped, year after year. Was this the most exceptional and effective standard of prison security? 'How on earth do you manage to have zero escapes, every year?' he asked a Finnish civil servant.
'Simple,' said the official, 'these are open prisons.'
Britain experienced a moral panic in early 2006 at the rate at which inmates were found to be strolling out of open prisons as if for a weekend ramble. By comparison, this seemed a truly astonishing performance. What was the Finnish secret?
'Open prisons? You never have anyone escape from an open prison?'
'Oh not at all! But because they are open prisons, we don't call it escape, we classify it as absent without leave.'
It is Christopher Pollitt's favourite international comparison, he says. When you reach down into the detail, he argues, there are hundreds of such glitches. Finland did not, as it happens, boast the most effective prison security in the world, nor, as some might have wanted to conclude from a comparison of the numbers of 'escapes' alone, had it won, through heart-warming trust and a humane system, sublimely co-operative inmates.
At least, we don't think so, although to be honest, we are not at all sure. Explanations are no more robust than the data they are built on. It is ludicrous, the time we devote to this, but we seem urgently to seek explanations for differences between nations – why we are good and they're bad or vice versa – when, if we cared to look, we would find reason to doubt whether the differences even existed in the terms described.
The problem begins in the simplest geographical sense: never forget the 'there' in 'how many are there?' All counting takes place somewhere, and part of the definitional headache is the need to say where. Just as when we ask how many sheep there are in a field, the field we have in mind is better if well fenced.
When it is not, we find this delicious example, in work by the Organisation for Economic Cooperation and Development – the OECD – a highly respected association of the world's developed nations with a well-regarded and capable team of researchers and economists. The OECD wanted to know – how trivially straightforward is this? – how many nurses there were in the UK, per person, compared with other countries.
'Nurse', it seems, has a settled meaning in the OECD countries; thus far, thus well defined. So a researcher contacted the Department of Health in London and asked something along the lines of: 'How many nurses do you have there?' The Department of Health answered. The OECD divided the number of nurses by the population of the UK to find the number of nurses per head.
Too bad for the OECD, health is now a devolved function in Scotland, the responsibility of the Scottish Executive in Edinburgh, not the Westminster Parliament. So the Department of Health in London defined 'there' as England and Wales and Northern Ireland, the areas for which it still had full responsibility. The OECD used a population figure for the whole UK. How easy it is to stray from the straightest path.
It was no surprise that staffing in our health service looked anaemic. The number of nurses in England, Wales and Northern Ireland, divided by a population which also included Scotland, compared wretchedly with other developed countries.
International rankings are proliferating. We can now read how the UK compares with other countries on quality of governance, business climate, health, education, transport and innovation, to name a few, as well as more frivolous surveys like an international happiness index – 'the world grump league', as one tabloid reported it. 'Welcome', says Christopher Hood from Oxford University, who leads a research project into international comparisons, 'to Ranking World'. The number of international governance rankings, he says, has roughly doubled every decade since the 1960s.
Of course, you want to know how well Britain scores in these rankings; making such comparisons is irresistible, and even a well-informed sceptic like Christopher Hood enjoys reading them. We'll tell you the answer at the end of the chapter. First, some self-defence against the beguiling simplicity of Ranking World.
'With a header in the 27th minute followed by a second in first-half injury time, playmaker Zinedine Zidane sent shock waves through his Brazilian opponents from which they would never recover … The French fortress not only withstood a final pounding from Brazil but even slotted in another goal in the last minute.'
The words are by Fifa, the world governing body of football, describing France's victory in the final of the World Cup in 1998, as only football fans could. Two years on, the Gallic maestros, as Fifa would probably put it, stunned the world again, taking top spot in the league of best healthcare systems compiled by the World Health Organisation.
Britain finished a lowly 18th – in the WHO rankings that is, not the World Cup – a poor showing for a rich country. The United States, richest of all, was ranked 50th; humiliating, if you believed the WHO league was to be taken seriously. And though the WHO is a highly respected international organisation whose league tables are widely reported, many, particularly in the United States, did not.
The great advantage a football league has over healthcare is that in football there is broad agreement on how to compile it. Winning gets points, losing doesn't, little more need be said (give or take the odd bar-room post-match inquest about goals wrongly disallowed, and other nightmare interventions by the ref). Being that easy, and with the results on television on Saturday afternoons, it is tempting to think this is what league tables are like: on your head, Zidane, ball in the back of the net, result, no problem.
But for rankings of national teams, even Fifa acknowledges the need for some judgement. For international games, each result is weighted according to eight factors: points are adjusted for teams that win against stronger rather than weaker opposition, for away games compared with home games, for the importance of the match (the World Cup counting most), for the number of goals scored and conceded. Gone is the simplicity of the domestic league. The world rankings are the result of a points system which takes all these factors and more into account, and when the tables are published, as they are quarterly, not everyone agrees that they're right. It is an example of the complexity of comparison – how good is one team, measured against another – in a case where the measurement is ostensibly easy.
Observing France's double triumph, in football and health, Andrew Street of York University and John Appleby of the King's Fund health think tank set out, tongue in cheek, to discover if there was a relationship between rankings of the best healthcare systems and Fifa rankings of the best football teams.
And they fo
und one. The better a country is at football, the better its healthcare. Did this mean the England manager was responsible for the nation's health, or that the Secretary of State for Health should encourage GPs to prescribe more football? Not exactly: the comparison was a piece of calculated mischief designed to show up the weaknesses of the WHO rankings, and the correlation was entirely spurious.
They made it work, they freely admitted, by ignoring anything that didn't help, playing around with adjustments for population or geography until they got the result they wanted. Their point was that any ranking system, but especially one concerned with something as complicated as health-care, includes a range of factors that can easily be juggled to get a different answer.
Some of the factors taken into account in the compilation of the WHO league are: life expectancy, infant mortality, years lived with disability, how well the system 'fosters personal respect' by preserving dignity, confidentiality, and patient involvement in healthcare choices, whether the system is 'client orientated', how equally the burden of ill health falls on people's finances, and the efficiency of healthcare spending (which involves an estimate of the best a system could do compared with what it actually achieves). Most people would say that most of these are important. But which is most important? And are there others, neglected here, that are more important?
This monstrous complexity, where each factor could be given different weight in the overall score, where much is estimated, and where it is easy to imagine the use of different factors altogether, means that we could if we wanted produce quite different rankings. So Street and Appleby decided to test the effect on the rankings of a change in assumptions. The WHO had claimed that its rankings were fairly stable under different assumptions. Street and Appleby found quite the contrary. Taking one of the trickier measures of a good health system, efficiency, they went back to the 1997 data used to calculate this, changed some specifications for what constituted efficiency and, depending which model they used, found that a variety of countries could finish top. They managed, for example, to move Malta from first to last of 191 countries. Oman ranged from 1st to 169th. France on this measure finished from 2nd to 160th, Japan from 1st to 103rd. The countries towards the bottom, however, tended to stay more or less where they were whatever the specification for efficiency.
They concluded, 'The selection of the WHO dimensions of performance and the relative weights accorded to the dimensions are highly subjective, with WHO surveying various “key informants” for their opinions. The data underpinning each dimension are of variable quality and it is particularly difficult to assess the objectivity with which the inequality measures were derived.'
In short, what constitutes a good healthcare system is in important ways a political judgement, not strictly a quantitative one. The United States does not run large parts of its health system privately in a spirit of perversity, knowing it to be a bad system compared with other countries. It does it this way because, by and large, it thinks it best. We might disagree; but to insist the US be ranked lower because of that country's choices is to sit in judgement not of its healthcare system but its political values.
It is tempting, once more, to give up on all comparisons as doomed by the infinite variety of local circumstances. But we can overdo the pessimism. The number of children per family, or the number of years in formal education, or even, at a pinch, household income, for example, are important measures of human development and we can record them just about accurately enough across most countries so that comparisons are easy and often informative. The virtue of these measures is that they are simple, counting one thing and only one thing, with little argument about definitions. Such comparisons, by and large, can be trusted to be reasonably informative, even if not absolutely accurate.
The more serious problems arise with what are known as composite indicators, such as the quality of a health system, which depend on bundling together a large number of different measures of what a health system does – how well your doctor treats you in the surgery, how long you wait, how good the treatment is in hospitals, how comfortable, accessible, expensive and so on, and where some of what we call 'good' will really mean what satisfies our political objectives. If one population wants abundant choice of treatment for patients, and another is not bothered about choice, thinks in fact that it is wasteful, which priority should be used to determine the better system?
What is important, for example, for children to learn in maths? In one 2006 ranking, Germany was ahead of the UK, in another the UK was ahead of Germany. You would expect maths scores, of all things, to be easily counted. Why the difference?
It arose simply because each test was of different kinds of mathematical ability. That tendency noted at the outset to assume that things compared must be the same implies, in this case, that the single heading 'maths' covers one indivisible subject. In fact, the British maths student turns out to be quite good at practical applications of mathematical skill – for example to decide the ticket price for an event that will cover costs and have a reasonable chance of making a profit – while German students are better at traditional maths such as fractions. Set two different tests with emphasis on one or the other but not both, and guess what happens? The reaction in Germany to their one bad performance (forget the good one) bordered panic. There was a period of soul searching at the national failure and then a revision of the whole maths curriculum.
Though the need to find like-for-like data makes comparison treacherous, there are many comparisons we make that seem to lack data altogether. The performance of the American and French economies is one example. To parody only slightly, the perception in parts of the UK seems to be that France is a country of titanic lunch breaks, an over-mighty, work-shy public sector, farmers with one cow each and the tendency to riot when anyone dares mention competition.
America by contrast, the land of turbocharged capitalism, roars ahead without holidays or sleep. And if you measure the American economic growth rate, it is, averaged over recent years, about 1 per cent higher than in France – a big difference.
Look a little more closely, though, and it turns out that the American population has also been growing about 1 per cent faster than that of France. So it is not that the Americans work with more dynamism, just in more and more numbers. When we look at the output of each worker per hour, it turns out that the French produce more than the Americans and have done for many years – their lead here has been maintained. Even the French stock market has outperformed the American, where $1 invested thirty years ago is now worth about $36 while in France it is worth $72 (October 2006).
None of these numbers is conclusive. All could be qualified further, by noting French unemployment for example. Summary comparisons of complicated things are not possible with single numbers. When comparing such monstrously complex animals as entire economies, remember once again how hard it is to see the whole elephant.
Meaningful comparison is seldom found in single figures. Exceptions are when the figures apply to a single indicator, not a composite, when there's little dispute about the definitions, and where the data will be reasonably reliable. One such is child mortality. There is no debate about what a death is and we can define a child consistently. There will, in some countries, be difficulties collecting the data so that the figures will be approximate, as usual. But we can nevertheless effectively compare child mortality across the world, noting, for example, a rate in Singapore and Iceland of 3 children per 1,000 aged under five, and in Sierra Leone of 283 children per 1,000 (The State of the World's Children, UNICEF, 2006) and we can be justifiably horrified.
More complicated comparisons require more care. But if care is taken, they can be done. In Aylesbury prison in 1998 one group of prisoners was given a combination of nutritional supplements; another was given a placebo. Otherwise they continued eating as normal. The group receiving the genuine supplements showed a marked improvement in behaviour. The researchers concluded that improved nutrition had probably
made the difference. Years before Jamie Oliver, the results had significant implications for criminal justice and behaviour in general, but seem to have been effectively binned by the Home Office, which refused us an explanation for its unwillingness to support a follow-up trial, and finally acquiesced to a new study only this year.
Yet this comparison had merit. Care was taken to make sure the two groups were as alike as possible so that the risk of there being some lurking difference, what is sometimes known as a confounding variable, was minimised. The selected prisoners were assigned to the two groups at random, without either researchers or subjects knowing who was receiving the real supplement and who was receiving the placebo until afterwards, in order that any expectations they might have had for the experiment were not allowed to interfere with the result. This is what is known as a double-blind randomised placebo-controlled trial. Since the experiment took place in prison, the conditions could be carefully controlled.
A clear definition of how misbehaviour was to be measured was determined at the outset, and it was also measured at different degrees of severity. A reasonable number of people took part, some 400 in all, so that fluke changes in one or two prisoners were unlikely to bias the overall result. And the final difference between the two groups was large, certainly large enough to say with some confidence that it was unlikely to be caused by chance.
This is statistics in all its sophistication, where numbers are treated with respect. The paradox is that the experiment had to be complicated in order to ensure that what was measured was simple. They had to find a way of ruling out, as far as possible, anything else that might account for a change in behaviour. With painstaking care for what numbers can and cannot do, a clear sense of how the ordinary ups and downs of life can twist what the results seem to show, and a narrow, well-defined question, the researchers might have hit on something remarkable.
The Tiger That Isn't Page 18