Humble Pi

Page 25

by Matt Parker

But they are there for a reason. If a program breaks, a good error message detailing what led up to the disaster can give the person fixing it a rolling start. But many computer error messages are just a code which needs to be looked up. Some of these error codes become ubiquitous enough that the general public understand them. If something goes wrong browsing the web, many people know that ‘error 404’ means the site could not be found. Actually, any website error like this starting with a 4 means the fault was at the user’s end (like 403: trying to access a forbidden page), and codes starting with a 5 are the fault of the server. Error code 503 means the server was unavailable; 507 means its storage is too full.

Always hilarious, internet engineers have designated error code 418 as ‘I’m a teapot.’ It is returned by any internet-enabled teapots, which are sent a request to make coffee. It was introduced as part of the 1998 release of Hyper Text Coffee Pot Control Protocol (HTCPCP) specifications. Originally an April Fools’ joke, connected teapots have of course since been made and run according to HTCPCP. An attempt to remove this error in 2017 was defeated by the Save 418 Movement, which preserved it as ‘a reminder that the underlying processes of computers are still made by humans’.

Because they are only intended to be used by tech people, many computer error messages are very utilitarian and definitely not user-friendly. But some serious problems can result when non-technical users are faced with an overly technical error message. This was one of the problems with the Therac-25 radiation machine with roll-over issues. The machine would produce around forty error messages a day, with unhelpful names, and as many of them were not important the operators got into the habit of quick fixes which allowed them to continue with the treatments. Some of the overdose cases could have been prevented if the operator had not dismissed error messages and continued.

In one case in March 1986 the machine stopped functioning and the error message ‘Malfunction 54’ appeared on the screen. Many of the errors were just the word ‘Malfunction’, followed by a number. When malfunction number 54 was looked up, the explanation was that it was a ‘dose input 2 error’. In the subsequent inquiry it was discovered that a dose input 2 error meant the dose was either too high or too low.

All these impenetrable codes and description would be comical if not for the fact that the patient in the ‘Malfunction 54’ case died from the resulting radiation overexposure. When it comes to medical equipment, bad error messages can cost lives. One of the recommended modifications before the Therac-25 machines could go back into service was ‘Cryptic malfunction messages will be replaced with meaningful messages.’

In 2009 a collection of UK universities and hospitals banded together to form the CHI+MED project: Computer–Human Interaction for Medical Devices. They thought that more could be done to limit the potentially dangerous effects of maths and technology mistakes in medicine and, much like the Swiss cheese model, they believed that, instead of finding individuals to blame, the system as a whole should be geared to avoid errors.

In the medical field there is the general impression that good people don’t make mistakes. Instinctively, we feel that the person who ignored the Malfunction 54 message and hit P on the keyboard to proceed with the dose is to blame for the death of that patient. But it’s more complicated than that. As Harold Thimbleby from CHI+MED points out, it’s not a good system simply to remove everyone who admits to making a mistake.

People who do admit making errors are at best suspended or moved on, thus leaving behind a team who ‘do not make errors’ and have no experience of error management.

– H. Thimbleby, ‘Errors + Bugs Needn’t Mean Death’, Public Service Review: UK Science & Technology, 2, pp. 18–19, 2011

He points out that, in pharmacy, it is illegal to give a patient the wrong drug. This does not promote an environment of admitting and addressing mistakes. Those who do make a slip-up and admit it might lose their job. This survivor bias means that the next generation of pharmacy students are taught by pharmacists who have ‘never made mistakes’. It perpetuates an impression that mistakes are infrequent events. But we all make mistakes.

In August 2006 a cancer patient in Canada was put on the chemotherapy drug Fluorouracil, to be delivered by an infusion pump which would gradually release the drug into their system over four days. Very sadly, due to an error in the way the pump was set up, all the drug was released in four hours and the patient died from the overdose. The simple way to process this is to blame the nurse who set up the pump, and maybe the nurse who double-checked their work. But, as always, it is a bit more complicated than that.

5-Fluorouracil 5,250mg (at 4,000mg/m2) intravenous once continuous over 4 days … Continuous infusion via ambulatory infusion pump (Baseline regimen dose = 1,000mg/m2/day = 4,000 mg/m2/4 days).

– Electronic order for Fluorouracil

The original order for Fluorouracil was hard enough to follow, but it was then passed on to a pharmacist who made up 130 millilitres of a 45.57mg/ml fluorouracil solution. When this arrived at the hospital, a nurse had to calculate at what rate of release to set the pump. After doing some working out with a calculator, they came to the number of 28.8 millilitres. They looked at the pharmacy label and, sure enough, in the dose section it listed 28.8 millilitres.

But during the calculation the nurse had forgotten to divide by the twenty-four hours in a day. They had worked out 28.8 millilitres per day and assumed it was 28.8 millilitres per hour. The pharmacy label actually listed the 28.8 millilitres per day amount first and, after that, in brackets, was the hourly amount (1.2ml/h). A second nurse checked their work and, now with no calculator within reach, they did the calculation on a scrap of paper and made exactly the same mistake. Because it matched a number on the packet, they didn’t question it. The patient was sent home and was surprised that the pump, which should have lasted four days, was empty and beeping after only four hours.

There is a lot that can be learned from this in terms of how drug-dose orders are described and how pharmaceutical products are labelled. There are even lessons in terms of the wide range of complex tasks given to nurses and the support and double-checking that is available. But the CHI+MED folks were even more interested in the technology which had facilitated these maths errors.

The interface with the pump was complicated, and not intuitive. Beyond that, the pump had no built-in checks and happily followed instructions to empty itself at an abnormally fast rate for this drug. For a life-critical pump, it would make sense for it to know what drug is being administered and do a final check on the rate it has been programmed at (and then display an understandable error message).

Even more interesting to me is CHI+MED’s observation that the nurse used a ‘general-purpose calculator that had no idea what calculation was being done’. I’d never really thought about how all calculators are general purpose and blindly spit out whatever answer matches the buttons you happen to mash. On reflection, most calculators have no error checks built in at all and should not be used in a life-or-death situation. I mean, I love my Casio fx-39, but I wouldn’t trust my life to it.

CHI+MED has since developed a calculator app which is aware of what calculation is being performed on it and blocks over thirty common medical calculation errors. This includes some common errors I think all calculators should be able to catch, like misplaced decimal points. If you want to type 23.14 but accidentally hit 2.3.14, it’s a toss-up how your calculator will take that. Mine shows that I have entered 2.314 and carries on like nothing happened. A good medical calculator will flag up if the numbers entered were at all ambiguous; otherwise, it’s a factor-of-ten accident waiting to happen.

Programming has inarguably been a huge benefit to humankind, but it is still early days. Complex code will always react in ways its developers did not see coming. But there is the hope that well-programmed devices can add a few extra slices of cheese into our modern systems.

So, What Have We Learned from Our Mistakes?

&
nbsp; While I was writing this book, on one of our many travels my wife and I took a break from work and spent a day sightseeing around a generic foreign city. Quite a large and famous city. We did some pretty standard touristic stuff, but then I realized that we were in the same city as an engineering thing a friend of mine had worked on.

This friend of mine had been involved in the design and construction of an engineering project (think something like a building or bridge) in the last few decades. They had told me one time over some beers about a mistake they had made in the design process, a mathematical error which had, thankfully, made no impact on the safety of this thing at all. But it had changed it slightly, in a near-trivial aesthetic way. Something did not line up in quite the way it was originally planned. And yes, this story is deliberately vague.

You see, my (eternally supportive) wife helped me hunt down the visual evidence of my friend’s mathematical mistake so I could take a photo of myself with it. I have no idea what any passers-by thought of me posing with seemingly nothing. But I was so excited. This was going to be a great contemporary example to include in this book. There are plenty of historical engineering mistakes, but my friend is still alive and I could get a personal account of how the mistake was made. It was also nothing dangerous, so I could candidly explain the process behind how it happened.

I’m afraid you can’t use that.

I could almost hear the regret in their voice at ever having told me about the mistake in the first place. Showing them my holiday photos of me with the manifestation of their miscalculation did nothing to persuade them. They explained that, while this sort of thing will be discussed and analysed within a company, it is never released or made public at all – even something as inconsequential as this. The contract paperwork and non-disclosure agreements legally restrict engineers from disclosing almost anything about projects for decades after they are completed.

So there you are. I can’t tell you anything about it at all, other than that I can’t tell you anything about it. And it’s not even just engineers who are being restricted from speaking publicly. A different mathematical friend of mine does consulting work about the mathematics of a very public-facing area of safety. They will be hired by one company to do some research and uncover industry-wide mistakes. But then, when working for a different company or even advising the government on safety guidelines, they will not be able to disclose what they previously discovered on someone else’s dime. It’s all a bit silly.

Humans don’t seem to be good at learning from mistakes. And I don’t have any great solutions: I can totally appreciate that companies don’t want their flaws, or the research they had to fund, to be released freely. And, for my friend’s aesthetic engineering mistake, it’s maybe fine that no one else ever finds out. But I wish there was a mechanism in place to ensure that important, potentially useful lessons could be shared with the people who would benefit from knowing. In this book I’ve done a lot of research from accident-investigation reports which are publicly released, but that generally only happens when there is a very obvious disaster. Many more, quiet mathematical mistakes are probably swept under the rug.

Because we all make mistakes. Relentlessly. And that is nothing to be feared. Many people I speak to say that, when they were at school, they were put off mathematics because they simply didn’t get it. But half the challenge of learning maths is accepting that you may not be naturally good at it but, if you put the effort in, you can learn it. As far as I’m aware, the only quote from me that has been made into a poster by teachers and put up in their classrooms is: ‘Mathematicians aren’t people who find maths easy; they’re people who enjoy how hard it is.’

In 2016 I accidentally became the poster-child for when your mathematical best is just not good enough. We were filming a YouTube video for the Numberphile channel and I was talking about magic squares. These are grids of numbers which always give the same total if you add the rows, columns or diagonals. I’m a big fan of magic squares and thought it was interesting that no one had ever found a three-by-three magic square made entirely out of square numbers. Nor had anyone managed to prove that no such square existed. It was not the most important open question in mathematics, but I thought it was interesting that it was still unsolved.

So I gave it a go. As a programming challenge to myself, I wrote some code to see how close I could get to finding a magic square of squares. And I found this:

It gives the same total along every row and column but in only one of the two diagonals. I was one total short of it working. Also, I was using the same numbers more than once, and in a true magic square all the numbers should be different. So my attempt at a solution had come up short. This did not surprise me: it had already been proven that any successful three-by-three magic square of squares would contain all numbers bigger than a hundred trillion. My numbers ranged from 12 = 1 to 472 = 2,209. I just wanted to give it a go and see how far I could get.

The video was filmed by Brady Haran, and he was less forgiving, essentially pointing out that my solution was not very good at all. When he asked me what it was called, I knew immediately that, if I called it a ‘Parker Square’, then it would become a mascot for getting things wrong. Not that I had a choice. Brady called the video The Parker Square, and the rest is history. It became an internet meme in its own right and, instead of ‘not making a big deal about it’, Brady released a range of T-shirts and mugs. People take great delight in wearing the T-shirts when they come to see my shows.

I’ve tried to wangle the Parker Square back to being a mascot of the importance of giving something a go, even when you’re likely to fail. The experience people seem to have at school is that getting something wrong in maths is terrible and to be avoided at all costs. But you’re not going to be able to stretch yourself and try new challenges without occasionally going wrong. So, as some kind of compromise, the Parker Square has ended up being ‘a mascot for people who give it a go but ultimately fall short’.

All of that said, as this book has made clear, there are situations where the mathematics needs to be done correctly. Sure, people playing around with and investigating new maths can make all sorts of mistakes but, once we are using that mathematics in life-critical situations, we had better be able to consistently get it right. And given that, often, we’re stretching beyond what humankind is naturally capable of, there are always going to be some mistakes waiting to happen.

This is my life now.

The Space Shuttle Main Engine is a very remarkable machine. It has a greater ratio of thrust to weight than any previous engine. It is built at the edge of, or outside of, previous engineering experience. Therefore, as expected, many different kinds of flaws and difficulties have turned up.

– Appendix F: Personal observations on the reliability of the Shuttle by R. P. Feynman, from Report to the President by the PRESIDENTIAL COMMISSION on the Space Shuttle Challenger Accident, 6 June 1986

I believe it is worth being pragmatic when it comes to avoiding disasters. Mistakes are going to happen, and systems need to be able to deal with that and stop them from being disasters. The CHI+MED team who are researching the computer–human interactions with medical devices actually came up with a new version of the Swiss Cheese model which I’m quite partial to: the Hot Cheese model of accident causation.

This turns the Swiss cheese on its side: imagine the slices of cheese are horizontal and mistakes are raining down from the top. Only mistakes which fall down through holes in every layer make it out the bottom to become accidents. The new element is that the cheese slices themselves are hot and parts of them are liable to drip down, causing new problems. Working with medical devices made the CHI+MED folks realize that there was a cause of accidents not represented by the Swiss Cheese model: layers and steps within a system could themselves cause mistakes to happen. Adding a new layer does not automatically reduce how many accidents are happening. Systems are more complicated and dynamic than that.

No one wants extra
drips in the fondue pot of disaster.

They use the example of the Barcode Medication Administration systems which were introduced to use barcodes to reduce pharmacy dispensing mistakes. These systems definitely helped reduce errors where the wrong medication was given, but they also opened up all-new ways for things to go wrong. In the interest of saving time, some staff would not bother scanning the barcode on a patient’s wristband; instead, they would wear spare copies of patient barcodes on their belt or stick copies up in supply closets. They would also scan the same medication twice instead of scanning two different containers if they believed them to be identical. Now, having barcodes caused situations where the patients and drugs were less thoroughly checked than they were before. If a new system is implemented, humans can be very resourceful when finding new ways to make mistakes.

It can be very dangerous when humans get complacent and think they know better than the maths. In 1907 a combination road-and-railway steel bridge was being built across a section of the St Lawrence River in Canada, which was over half a kilometre wide. Construction had been going on for some time but on 29 August one of the workers noticed that a rivet he had put in place about an hour earlier had mysteriously snapped in half. Then, suddenly, the whole south section of the bridge collapsed, with a noise that was heard up to 10 kilometres away. Of the eighty-six people working on the bridge at the time, seventy-five died.

There had been a miscalculation as to how heavy the bridge would be, partly because, when the bridge design was increased from 1,600 feet to 1,800 feet, the forces had not been recalculated, so the lower support beams buckled and eventually failed completely. The workers had been voicing their concerns that the beams in the bridge were deforming as it was being constructed for some time, and some of them quit work because they were so worried. But the engineers did not listen to their concerns. Even when the load miscalculation error was discovered, the chief engineer decided to proceed with the construction anyway; they had concluded it would still be fine without doing adequate testing.

‹ Prev Next ›