Humble Pi

Page 20

by Matt Parker

I agree with the arguer in this case: ‘Millions’ can be used as part of a unit. And when you add and subtract numbers with the same units, the units always remain unchanged. But if you start multiplying and dividing, then the units can change. Our passionate friend here mentally removed the millions, did a subtraction-style comparison to show that 360 is bigger than 317 and then completely failed to notice they were also doing an implied division of 360 ÷ 317 = 1.1356 to show that everyone gets just over ‘one’ each.

Just over one each of what? Well, they put the units of ‘millions of dollars’ back on and concluded that everyone gets just over one of millions of dollars. But if you divide two numbers, you also have to divide their units. So the millions cancel out and everyone actually gets $1.14 each. So, for the most part, the logic is not without some justification; it just falls apart on the final unit hurdle.

This is possibly the greatest source of everyday maths errors. People get used to doing a calculation in a given situation, then use the same method in another situation, where it no longer works. I suspect everyone who passed on this meme in earnest looked at it and their brain did something along the same lines of seeing millions as a unit they could exclude from their calculations and put back on at the end.

Thankfully, this was way back in 2015, and in the years since then people have become much better at spotting fake news online.

Against the grain

Here’s one final story involving the pound, but in this case we’re looking at a smaller fraction of the pound: the grain. In the Apothecaries system of weight units, a pound can be split into 12 ounces, which each consist of 8 drams. A dram is then 3 scruples, each made from 20 grains. I hope that made sense. A grain is one 5,760th of a pound. But not a normal pound: this is a Troy pound. Which is different to a normal pound. And people wonder why the metric system was invented …

Let me try again. A kilogram is made up of 1,000 grams, which can then be split into 1,000 milligrams each. A grain is an archaic unit equal to about 64.8 milligrams. Phew. That was easier.

The problem is that, in the US, the Apothecaries system of units is still used as one of the systems for measuring medications. On the long-list of places where you don’t want to be on the receiving end of the errors which result from having conflicting systems of units, medicine has to be right up there. To make matters worse, the shorthand for grain is ‘gr’, and this can easily be mistaken for a gram.

And, sure enough, it happens. A patient taking Phenobarbital (an anti-epileptic drug) was prescribed 0.5gr per day (32.4 milligrams), and this was mistaken for 0.5 grams per day (500 milligrams). After three days on over fifteen times their normal dose, the patient started to have respiratory problems. Thankfully, when they were taken off the dose, they made a full recovery. This was a case of no grain, no pain.

ELEVEN

Stats The Way I Like It

Even though I was born in Perth, Western Australia, I have lived in the UK for so long my accent is now 60 per cent to 80 per cent British. While I enjoy sports, I’m not a super-fan of any of them, and it has been a very long time since I’ve applied a prawn to a barbecue. I’m not a typical Australian. But then again, no one is.

After the 2011 census the Australian Bureau of Statistics published who the average Australian was: a thirty-seven-year-old woman who, among other things, ‘lives with her husband and two children (a boy and a girl aged nine and six) in a house with three bedrooms and two cars in a suburb of one of Australia’s capital cities’. And then they discovered that she does not exist. They scoured all the records and no one person matched all the criteria to be truly average. As they rightly pointed out:

While the description of the average Australian may sound quite typical, the fact that no one meets all these criteria shows that the notion of the ‘average’ masks considerable (and growing) diversity in Australia.

– Australian Bureau of Statistics

When it comes to measuring populations, a census is a bit of an extreme situation. When an organization wants to know something about a population, it usually checks a small sample and assumes it is representative of everyone else. But a government has the ability to throw scale to the wind and to just survey absolutely everyone. This does end up producing an overwhelming amount of data – which, ironically, is then reduced down to representative statistics.

The US constitution requires a nationwide census every ten years. But by 1880, due to the increase in population and in the census questions, it was taking eight years to process all the data. To fix the problem, electromechanical tabulating machines were invented that could automatically total up data which had been stored on punch cards. Tabulating machines were used in the 1890 census and were able to complete the data analysis in only two years.

Before long, tabulating machines were doing more and more complicated processing of data: sorting it by different criteria and even doing basic maths instead of simply keeping tallies. Arguably, the need to crunch census data led to our modern computing industry. This first census punch-card tabulating machine was invented by Herman Hollerith, who founded the Tabulating Machine Company, which eventually merged with another tabulation company and evolved into IBM. There may be direct ancestry from the computer you use at work today to punch-card sorting machines over a century ago.

This is why I found the 2016 census in Australia particularly pleasing. I happened to be in the country for what was the first Australian census to be run almost entirely online and the Australian Bureau of Statistics had given the contract to host the census to none other than IBM. It turned out that IBM botched the process and the census site went offline for forty hours, but, if we ignore that, it was nice to see IBM still in the cutting-edge census-technology business. Though, given how their site handled the traffic, they might still have been using a punch-card tabulating machine back end.

Would this new survey produce an average Australian who actually existed? When I was back in Australia in 2017 and flicking through the West Australian newspaper I unexpectedly saw a story about results from the previous year’s census. The paper was outlining who the average ‘West Australian’ would be: a thirty-seven-year-old male with two kids, one of his parents was born overseas … and so on. I skimmed ahead to where the journalist writing the article was unable to find someone who actually was that average.

Instead, I found Tom Fisher’s face smiling back at me. Mr Average himself.

They had done it. They had found someone who supposedly matched all of the most average criteria. Tom himself did not seem to be that excited about the title of ‘Mr Average’, pointing out that he works as a musician (he’s quite a vital part of WA band Tom Fisher and The Layabouts). But according to the newspaper, he deserved it because he was:

a thirty-seven-year-old man

born in Australia with at least one parent from overseas

speaks English at home

is married with two children

does between five and fourteen hours of unpaid domestic work a week

has a mortgage on a four-bedroom house with two cars in the garage

That is a shorter list than the previous census’s average Australian, but it was still impressive that someone who matched all the criteria had been found. I tracked Tom down and emailed him to ask about his averageness. Perth is not that big, and it did not take much internet stalking and asking around to locate him. He seemed to have grown into the role of Mr Average and happily offered his averageness to help me however he could. I explained how surprised I was that he existed and that he matched all the criteria;

‘Yeah, mate, can confirm the averageness. All except both my parents were born in Oz.’

I knew it! The newspaper had been deliberately vague, and Tom did not actually match all the criteria. It is with great hesitation that I expose this. I thought that maybe people would get more from the idea he represented than from the Mr Average he actually was. But, on balance, it is interesting that, even on a few measures, the West Au
stralian newspaper could not find a Mr Average.

Having unmasked one Mr Average, I was prepared to make amends and find a replacement. I contacted the Australian Bureau of Statistics (ABS) to see if it was possible to find someone with the reduced criteria the newspaper used, instead of the full average Australian range of statistics. The fine people at ABS found my request interesting enough to dig through the data for me. Expanding the population considered from West Australia to the whole country subtly changed the averages: Mr Average is now a woman with one fewer bedrooms in her house. They estimated that, for the loosest definition of average (using only a few main statistics), there would only be ‘roughly four hundred’ matching people out of Australia’s then population of 23,401,892.

So there you have it: 99.9983 per cent of the Australian population is not average. I’m in pretty good company after all.

If the data fits

In the 1950s the US Air Force found out the hard way that no one is average. Pilots in the Second World War had worn quite baggy uniforms and the cockpits were big enough to allow for a wide range of body types. But the new generation of fighter jets allowed for much less give all round, from compact cockpits to skin-tight garments (for the record, ‘skin-tight garment’ is the US Air Force’s description). They needed to know exactly how big their flying personnel were so they could make jets and clothes to fit.

The air force sent a crack team of measurersfn1 to 14 different air force bases and measured a total of 4,063 personnel. Each person had 132 different measurements taken, including such classics as Nipple Height, Nose Length, Head Circumference, Elbow Circumference (Flexed) and Buttock–Knee Length. The measurement squad was able to do this in as little as two and a half minutes per human, measuring up to 170 people a day. Those on the receiving end of the measuring described it as ‘the fastest and most thorough going-over they’d ever had’.

For each of the 132 measurements, the team then had to compute the mean, the standard deviation, the standard deviation as a percentage of the mean, the range and twenty-five different percentile values. So of course they turned to the super-computers of the day: punch-card tabulation machines from IBM. The data was entered on punch cards which could then be sorted and tabulated by the electromechanical machines. The statistical calculations were done on mechanical desktop calculators. This may sound onerous now, but at the time it must have seemed like magic to have data sorted by a large noisy machine and arithmetic performed by merely hand-cranking a machine on your desk. Like how, in half a century, people will not believe that, in the early years of the twenty-first century, we had to drive our own cars, physically type text messages and manually masticate.

Because the new-fangled technology was doing the sorting of the recording sheets, the report sheets used to record the data did not need to be arranged to make the later data-processing easy. Instead, they were arranged to minimize human error and even reduce how often people had to put different instruments down and pick them up. Tape measurements are all in one column and caliper measurements in another. It was an early case of reducing error by user-experience design.

How average are you?

Anthropometry of Air Force Flying Personnel (1950)

How does your nipple height compare to this guy from 1950? Do you look more or less excited than he does to have it measured?

A lot of effort was put into reducing all sources of error in the survey. Outliers were removed, with borderline cases dealt with on a ‘no harm, no foul’ basis: if it was uncertain if a particular value was an error or just an extreme value, they checked if removing it made any difference to the overall stats. If it didn’t, then problem avoided! And all statistical calculations were calculated twice in two different ways (if possible). Some statistical measures have more than one formula to produce them, so they would do both to make sure they received the same answer both ways.

As well as the statistical findings, a report called ‘The “Average Man”?’ was also produced, questioning the very existence of such a mythical beast. The sizes of uniforms were used as a perfect example. The survey of people could be used to make a new standard uniform to fit the middle 30 per cent of all the measurements, described as ‘approximately average’. But how many of the 4,063 people in the survey could wear such an approximately average uniform? The answer was zero. No member of the entire 4,063-people survey was in the middle 30 per cent for all ten possible uniform measurements.

The tendency to think in terms of the ‘average man’ is a pitfall into which many persons blunder when attempting to apply human-body-size data to design problems. Actually, it is virtually impossible to find an ‘average man’ in the air force population. This is not because of any unique traits of this group of men, but because of the great variability of bodily dimensions which is characteristic of all men.

– ‘The “Average Man”?’, Gilbert S. Daniels

Gilbert Daniels had been part of the team conducting the air-force survey. He had studied physical anthropology and had discovered during his studies, when measuring the hands of the admittedly very homogeneous male Harvard student population, that there was a wide variety of measurements and that no one student’s hand was close to being average. I have no idea how he got those measurements. But I love the picture of Daniels running around a university campus trying to convince his fellow students to hand over their private data, like some kind of hand-size-obsessed Zuckerberg.

Daniels’ report led to the air force not trying to find an average person but instead engineering things to accommodate variation. They are commonplace now and seem blatantly obvious, but things like adjustable car seats and helmet straps which can be lengthened and shortened came out of the air force embracing variance. The survey ended up being useful not in showing what the average service person was like but by indicating just how much variation there was among them.

Some averages are more equal than others

In 2011 the website OKCupid had a problem common among dating sites: their attractive users were being swamped with messages, and that kind of signal-to-noise could push them away from the site. Users could rate each other’s looks on a scale of 1 to 5, and those who averaged at the high end of the attractiveness spectrum were receiving twenty-five times as many messages as those at the other end. But the folks who founded OKCupid happened to be mathematicians, and the site is almost as much about data as dates. So they dug into the stats and, along the way, they found something interesting.

People towards the top of the attractiveness scores but not at the extreme end, with average ratings of around 3.5, were receiving a huge range of numbers of messages. One user with an average rating of 3.3 was getting 2.3 times the normal amount of messages, but someone at the 3.4 level of attractiveness was getting only 0.8 of the normal amount of messages. There was something other than their average attractiveness rating influencing how much attention they were getting from other users.

If a user had an attractiveness rating of 3.5, there are multiple ways other users could have rated them between 1 and 5 to give that result. What OKCupid founder Christian Rudder discovered was that people who achieve a rating of around 3.5 because a lot of people scored them as 3 or 4 did not get nearly as many messages as users who achieved their 3.5 via a lot of 1s and 5s. The predictor for messages was not the average value of the attractiveness score but rather how spread out they were. Rudder concluded that users were hesitant to message people they thought everyone would find attractive and would focus their attention on people they found attractive but thought other people might not.

Both of these sets of twenty ratings give an average score of 3.5, but which graph do you find more attractive?

The spread of data can be measured with the standard deviation (or variance, which is the standard deviation squared). OKCupid users with the same average attractiveness score could have very different standard deviations in their ratings and that would be a better prediction of how many messages they would receive. T
his is how it worked in this case, but it is possible for different sets of data to not only have the same average but to have the same standard deviation.

In 2017 two researchers in Canada produced twelve sets of data which all had the same averages and standard deviations as a picture of a dinosaur. The ‘Datasaurus’ was a collection of 142 pairs of coordinates which, when plotted, looked like a dinosaur. The Datasaurus Dozen were twelve additional sets of 142 pieces of data, which, to two decimal places, had the same averages in both vertical and horizontal directions, and the same standard deviations in both directions, as the Datasaurus.fn2 Without being plotted, all these data sets look the same as numbers on paper; it’s a valuable lesson in the importance of data visualization. And to not trust headline stats.

For all plots: vertical average = 47.83; vertical standard deviation = 26.93; horizontal average = 54.26; horizontal standard deviation = 16.79.

Some of the Datasaurus Dozen. I personally would have gone with Triceraplots.

This should bias some time

First, you get the stats; then, you analyse the stats. How data is collected is as important as how it is analysed. There are all sorts of biases that can be introduced during data collection which can influence the conclusions drawn. Near where I live in the UK there is a bridge over a river which is believed to have been built by monks in the 1200s. Given this is a bridge which has now survived for around eight hundred years, those monks must have really known what they were doing. A sign at the bridge points out that the supports of the bridge are shaped in such a way that the turbulence in the water as it flows by is diminished, reducing the erosion of the bridge. Smart monks.

‹ Prev Next ›