by Hannah Fry
Association: finding links
Association is all about finding and marking relationships between things. Dating algorithms such as OKCupid have association at their core, looking for connections between members and suggesting matches based on the findings. Amazon’s recommendation engine uses a similar idea, connecting your interests to those of past customers. It’s what led to the intriguing shopping suggestion that confronted Reddit user Kerbobotat after buying a baseball bat on Amazon: ‘Perhaps you’ll be interested in this balaclava?’11
Filtering: isolating what’s important
Algorithms often need to remove some information to focus on what’s important, to separate the signal from the noise. Sometimes they do this literally: speech recognition algorithms, like those running inside Siri, Alexa and Cortana, first need to filter out your voice from the background noise before they can get to work on deciphering what you’re saying. Sometimes they do it figuratively: Facebook and Twitter filter stories that relate to your known interests to design your own personalized feed.
The vast majority of algorithms will be built to perform a combination of the above. Take UberPool, for instance, which matches prospective passengers with others heading in the same direction. Given your start point and end point, it has to filter through the possible routes that could get you home, look for connections with other users headed in the same direction, and pick one group to assign you to – all while prioritizing routes with the fewest turns for the driver, to make the ride as efficient as possible.12
So, that’s what algorithms can do. Now, how do they manage to do it? Well, again, while the possibilities are practically endless, there is a way to distil things. You can think of the approaches taken by algorithms as broadly fitting into two key paradigms, both of which we’ll meet in this book.
Rule-based algorithms
The first type are rule-based. Their instructions are constructed by a human and are direct and unambiguous. You can imagine these algorithms as following the logic of a cake recipe. Step one: do this. Step two: if this, then that. That’s not to imply that these algorithms are simple – there’s plenty of room to build powerful programs within this paradigm.
Machine-learning algorithms
The second type are inspired by how living creatures learn. To give you an analogy, think about how you might teach a dog to give you a high five. You don’t need to produce a precise list of instructions and communicate them to the dog. As a trainer, all you need is a clear objective in your mind of what you want the dog to do and some way of rewarding her when she does the right thing. It’s simply about reinforcing good behaviour, ignoring bad, and giving her enough practice to work out what to do for herself. The algorithmic equivalent is known as a machine-learning algorithm, which comes under the broader umbrella of artificial intelligence or AI. You give the machine data, a goal and feedback when it’s on the right track – and leave it to work out the best way of achieving the end.
Both types have their pros and cons. Because rule-based algorithms have instructions written by humans, they’re easy to comprehend. In theory, anyone can open them up and follow the logic of what’s happening inside.13 But their blessing is also their curse. Rule-based algorithms will only work for the problems for which humans know how to write instructions.
Machine-learning algorithms, by contrast, have recently proved to be remarkably good at tackling problems where writing a list of instructions won’t work. They can recognize objects in pictures, understand words as we speak them and translate from one language to another – something rule-based algorithms have always struggled with. The downside is that if you let a machine figure out the solution for itself, the route it takes to get there often won’t make a lot of sense to a human observer. The insides can be a mystery, even to the smartest of living programmers.
Take, for instance, the job of image recognition. A group of Japanese researchers recently demonstrated how strange an algorithm’s way of looking at the world can seem to a human. You might have come across the optical illusion where you can’t quite tell if you’re looking at a picture of a vase or of two faces (if not, there’s an example in the notes at the back of the book).14 Here’s the computer equivalent. The team showed that changing a single pixel on the front wheel of the image overleaf was enough to cause a machine-learning algorithm to change its mind from thinking this is a photo of a car to thinking it is a photo of a dog.15
For some, the idea of an algorithm working without explicit instructions is a recipe for disaster. How can we control something we don’t understand? What if the capabilities of sentient, super-intelligent machines transcend those of their makers? How will we ensure that an AI we don’t understand and can’t control isn’t working against us?
These are all interesting hypothetical questions, and there is no shortage of books dedicated to the impending threat of an AI apocalypse. Apologies if that was what you were hoping for, but this book isn’t one of them. Although AI has come on in leaps and bounds of late, it is still only ‘intelligent’ in the narrowest sense of the word. It would probably be more useful to think of what we’ve been through as a revolution in computational statistics than a revolution in intelligence. I know that makes it sound a lot less sexy (unless you’re really into statistics), but it’s a far more accurate description of how things currently stand.
For the time being, worrying about evil AI is a bit like worrying about overcrowding on Mars.fn1 Maybe one day we’ll get to the point where computer intelligence surpasses human intelligence, but we’re nowhere near it yet. Frankly, we’re still quite a long way away from creating hedgehog-level intelligence. So far, no one’s even managed to get past worm.fn2
Besides, all the hype over AI is a distraction from much more pressing concerns and – I think – much more interesting stories. Forget about omnipotent artificially intelligent machines for a moment and turn your thoughts from the far distant future to the here and now – because there are already algorithms with free rein to act as autonomous decision-makers. To decide prison terms, treatments for cancer patients and what to do in a car crash. They’re already making life-changing choices on our behalf at every turn.
The question is, if we’re handing over all that power – are they deserving of our trust?
Blind faith
Sunday, 22 March 2009 wasn’t a good day for Robert Jones. He had just visited some friends and was driving back through the pretty town of Todmorden in West Yorkshire when he noticed the fuel light on his BMW. He had just 7 miles to find a petrol station before he ran out, which was cutting things rather fine. Thankfully his GPS seemed to have found him a short cut – sending him on a narrow winding path up the side of the valley.
Robert followed the machine’s instructions, but as he drove, the road got steeper and narrower. After a couple of miles, it turned into a dirt track that barely seemed designed to accommodate horses, let alone cars. But Robert wasn’t fazed. He drove five thousand miles a week for a living and knew how to handle himself behind the wheel. Plus, he thought, he had ‘no reason not to trust the TomTom sat-nav’.16
Just a short while later, anyone who happened to be looking up from the valley below would have seen the nose of Robert’s BMW appearing over the brink of the cliff above, saved from the hundred-foot drop only by the flimsy wooden fence at the edge he’d just crashed into.
It would eventually take a tractor and three quad bikes to recover Robert’s car from where he abandoned it. Later that year, when he appeared in court on charges of reckless driving, he admitted that he didn’t think to over-rule the machine’s instructions. ‘It kept insisting the path was a road,’ he told a newspaper after the incident. ‘So I just trusted it. You don’t expect to be taken nearly over a cliff.’17
No, Robert. I guess you don’t.
There’s a moral somewhere in this story. Although he probably felt a little foolish at the time, in ignoring the information in front of his eyes (like seeing a sheer drop out of
the car window) and attributing greater intelligence to an algorithm than it deserved, Jones was in good company. After all, Kasparov had fallen into the same trap some twelve years earlier. And, in much quieter but no less profound ways, it’s a mistake almost all of us are guilty of making, perhaps without even realizing.
Back in 2015 scientists set out to examine how search engines like Google have the power to alter our view of the world.18 They wanted to find out if we have healthy limits in the faith we place in their results, or if we would happily follow them over the edge of a metaphorical cliff.
The experiment focused around an upcoming election in India. The researchers, led by psychologist Robert Epstein, recruited 2,150 undecided voters from around the country and gave them access to a specially made search engine, called ‘Kadoodle’, to help them learn more about the candidates before deciding who they would vote for.
Kadoodle was rigged. Unbeknown to the participants, they had been split into groups, each of which was shown a slightly different version of the search engine results, biased towards one candidate or another. When members of one group visited the website, all the links at the top of the page would favour one candidate in particular, meaning they’d have to scroll right down through link after link before finally finding a single page that was favourable to anyone else. Different groups were nudged towards different candidates.
It will come as no surprise that the participants spent most of their time reading the websites flagged up at the top of the first page – as that old internet joke says, the best place to hide a dead body is on the second page of Google search results. Hardly anyone in the experiment paid much attention to the links that appeared well down the list. But still, the degree to which the ordering influenced the volunteers’ opinions shocked even Epstein. After only a few minutes of looking at the search engine’s biased results, when asked who they would vote for, participants were a staggering 12 per cent more likely to pick the candidate Kadoodle had favoured.
In an interview with Science in 2015,19 Epstein explained what was going on: ‘We expect the search engine to be making wise choices. What they’re saying is, “Well yes, I see the bias and that’s telling me … the search engine is doing its job.”’ Perhaps more ominous, given how much of our information we now get from algorithms like search engines, is how much agency people believed they had in their own opinions: ‘When people are unaware they are being manipulated, they tend to believe they have adopted their new thinking voluntarily,’ Epstein wrote in the original paper.20
Kadoodle, of course, is not the only algorithm to have been accused of subtly manipulating people’s political opinions. We’ll come on to that more in the ‘Data’ chapter, but for now it’s worth noting how the experiment suggests we feel about algorithms that are right most of the time. We end up believing that they always have superior judgement.21 After a while, we’re no longer even aware of our own bias towards them.
All around us, algorithms provide a kind of convenient source of authority. An easy way to delegate responsibility; a short cut that we take without thinking. Who is really going to click through to the second page of Google every time and think critically about every result? Or go to every airline to check if Skyscanner is listing the cheapest deals? Or get out a ruler and a road map to confirm that their GPS is offering the shortest route? Not me, that’s for sure.
But there’s a distinction that needs making here. Because trusting a usually reliable algorithm is one thing. Trusting one without any firm understanding of its quality is quite another.
Artificial intelligence meets natural stupidity
In 2012, a number of disabled people in Idaho were informed that their Medicaid assistance was being cut.22 Although they all qualified for benefits, the state was slashing their financial support – without warning – by as much as 30 per cent,23 leaving them struggling to pay for their care. This wasn’t a political decision; it was the result of a new ‘budget tool’ that had been adopted by the Idaho Department of Health and Welfare – a piece of software that automatically calculated the level of support that each person should receive.24
The problem was, the budget tool’s decisions didn’t seem to make much sense. As far as anyone could tell from the outside, the numbers it came up with were essentially arbitrary. Some people were given more money than in previous years, while others found their budgets reduced by tens of thousands of dollars, putting them at risk of having to leave their homes to be cared for in an institution.25
Unable to understand why their benefits had been reduced, or to effectively challenge the reduction, the residents turned to the American Civil Liberties Union (ACLU) for help. Their case was taken on by Richard Eppink, legal director of the Idaho division,26 who had this to say in a blog post in 2017: ‘I thought the case would be a simple matter of saying to the state: Okay, tell us why these dollar figures dropped by so much?’27 In fact, it would take four years, four thousand plaintiffs and a class action lawsuit to get to the bottom of what had happened.28
Eppink and his team began by asking for details on how the algorithm worked, but the Medicaid team refused to explain their calculations. They argued that the software that assessed the cases was a ‘trade secret’ and couldn’t be shared.29 Fortunately, the judge presiding over the case disagreed. The budget tool that wielded so much power over the residents was then handed over, and revealed to be – not some sophisticated AI, not some beautifully crafted mathematical model, but an Excel spreadsheet.30
Within the spreadsheet, the calculations were supposedly based on historical cases, but the data was so badly riddled with bugs and errors that it was, for the most part, entirely useless.31 Worse, once the ACLU team managed to unpick the equations, they discovered ‘fundamental statistical flaws in the way that the formula itself was structured’. The budget tool had effectively been producing random results for a huge number of people. The algorithm – if you can call it that – was of such poor quality that the court would eventually rule it unconstitutional.32
There are two parallel threads of human error here. First, someone wrote this garbage spreadsheet; second, others naïvely trusted it. The ‘algorithm’ was in fact just shoddy human work wrapped up in code. So why were the people who worked for the state so eager to defend something so terrible?
Here are Eppink’s thoughts on the matter:
It’s just this bias we all have for computerized results – we don’t question them. When a computer generates something – when you have a statistician, who looks at some data, and comes up with a formula – we just trust that formula, without asking ‘hey wait a second, how is this actually working?’33
Now, I know that picking mathematical formulae apart to see how they work isn’t everyone’s favourite pastime (even if it is mine). But Eppink none the less raises an incredibly important point about our human willingness to take algorithms at face value without wondering what’s going on behind the scenes.
In my years working as a mathematician with data and algorithms, I’ve come to believe that the only way to objectively judge whether an algorithm is trustworthy is by getting to the bottom of how it works. In my experience, algorithms are a lot like magical illusions. At first they appear to be nothing short of actual wizardry, but as soon as you know how the trick is done, the mystery evaporates. Often there’s something laughably simple (or worryingly reckless) hiding behind the façade. So, in the chapters that follow, and the algorithms we’ll explore, I’ll try to give you a flavour of what’s going on behind the scenes where I can. Enough to see how the tricks are done – even if not quite enough to perform them yourself.
But even for the most diehard maths fans, there are still going to be occasions where algorithms demand you take a blind leap of faith. Perhaps because, as with Skyscanner or Google’s search results, double-checking their working isn’t feasible. Or maybe, like the Idaho budget tool and others we’ll meet, the algorithm is considered a ‘trade secret’. Or perhaps, as in some m
achine-learning techniques, following the logical process inside the algorithm just isn’t possible.
There will be times when we have to hand over control to the unknown, even while knowing that the algorithm is capable of making mistakes. Times when we are forced to weigh up our own judgement against that of the machine. When, if we decide to trust our instincts instead of its calculations, we’re going to need rather a lot of courage in our convictions.
When to over-rule
Stanislav Petrov was a Russian military officer in charge of monitoring the nuclear early warning system protecting Soviet airspace. His job was to alert his superiors immediately if the computer indicated any sign of an American attack.34
Petrov was on duty on 26 September 1983 when, shortly after midnight, the sirens began to howl. This was the alert that everyone dreaded. Soviet satellites had detected an enemy missile headed for Russian territory. This was the depths of the Cold War, so a strike was certainly plausible, but something gave Petrov pause. He wasn’t sure he trusted the algorithm. It had only detected five missiles, which seemed like an illogically small opening salvo for an American attack.35
Petrov froze in his chair. It was down to him: report the alert, and send the world into almost certain nuclear war; or wait, ignoring protocol, knowing that with every second that passed his country’s leaders had less time to launch a counter-strike.
Fortunately for all of us, Petrov chose the latter. He had no way of knowing for sure that the alarm had sounded in error, but after 23 minutes – which must have felt like an eternity at the time – when it was clear that no nuclear missiles had landed on Russian soil, he finally knew that he had been correct. The algorithm had made a mistake.