The Rules of Contagion
Page 25
According to ecologist Lucy Aplin, both vertical and horizontal transmission of culture can occur in the animal world. ‘It really depends on the species, and also on the behaviour being learned.’ She points out that the type of transmission can affect how widely new information spreads. ‘You might imagine in, say, dolphins, where most of the learning happens vertically, you end up with family-specific behaviours and it’s quite hard for behaviours to spread more widely through the population.’ In contrast, horizontal transmission can result in much faster adoption of innovations. Such transmission is common in species of birds like great tits. ‘Much of their social learning occurs horizontally,’ Aplin said, ‘with information gained by observing unrelated individuals in the winter-flocking period, rather than transmitted from parent to offspring.’[33]
For some animals, the difference between transmission types could prove crucial to survival. As humans alter the natural environment more and more, species that can efficiently transmit innovations will be better placed to adjust to the changes. ‘Evidence is increasingly showing that some species can show a high degree of behavioural flexibility in the face of changing environments,’ Aplin said. ‘As a result, they appear to be successful at coping with human-modified habitats and human-induced change.’
Efficient transmission is also helping organisms resist human change at the microscopic level. Several types of bacteria have picked up mutations that make them resistant to antibiotics. As well as spreading vertically when bacteria reproduce, these genetic mutations often pass horizontally within the same generation. Just as software developers might copy and paste code between files, bacteria can pick up snippets of genetic material from each other. In recent years, researchers have discovered that this horizontal transmission is contributing to the emergence of superbugs such as MRSA, as well as drug-resistant STIs.[34] As bacteria evolve, many common infections may eventually become untreatable. In 2018, for example, a man in the UK was diagnosed with so-called ‘super-gonorrhea’, which was resistant to all standard antibiotics. He’d picked up the infection in Asia, but the following year two more cases appeared in the UK, this time with links to Europe.[35] If researchers are to successfully track and prevent such infections, they will need all the data they can get.
Thanks to the availability of new information sources like genetic sequences, we are increasingly able to unravel how different diseases and traits spread through populations. Indeed, one of the biggest changes to human healthcare in the twenty-first century will be the ability to rapidly and cheaply sequence and analyse genomes. As well as uncovering outbreaks, researchers will be able to study how human genes influence conditions ranging from Alzheimer’s to cancer.[36] Genetics has social applications too. Because our genomes can reveal characteristics like ancestry, genetic testing kits have become popular gifts for people interested in their family history.
Yet the availability of such data can have unintended effects on privacy. Because we share so many genetic characteristics with our relatives, it’s possible to learn things about people who haven’t been tested. In 2013, for example, The Times reported that Prince William had Indian ancestry, after testing two distant cousins on his mother’s side. Genetics researchers soon criticised the story, because it had revealed personal information about the prince without his consent.[37] In some cases ancestry revelations can have devastating consequences: there have been several reports of families thrown into disarray after discovering hidden adoptions or infidelity in a Christmas ancestry test.[38]
We’ve already seen how data about our online behaviour is gathered and shared so that companies can target adverts. Marketers don’t just measure how many people clicked on an ad; they know what kind of person they are, where they came from, and what they did next. By combining these datasets, they can piece together how one thing influences another. The same approach is common when analysing human genetic data. Rather than look at genetic sequences in isolation, scientists will compare them with information like ethnic background or medical history. The aim is to uncover the patterns that link the different datasets. If researchers know what these look like, they can predict things like ethnicity or disease risk from the underlying genetic code. This is why genetic testing companies like 23andMe have attracted so many investors. They aren’t just collecting customers’ genetic data; they are gathering information about who these people are, which makes it possible to gain much deeper health insights.[39]
It’s not just for-profit companies that are building such datasets. Between 2006 and 2010, half a million people volunteered for the UK Biobank project, which aims to study patterns in genetics and health over the coming decades. As the dataset grows and expands, it will be accessible to teams around the globe, creating a valuable scientific resource. Since 2017, thousands of researchers have signed up to access the data, with projects investigating diseases, injuries, nutrition, fitness, and mental health.[40]
There are huge benefits to sharing health information with researchers. But if datasets are going to be accessible to multiple groups, we need to think about how to protect people’s privacy. One way to reduce this risk is to remove information that could be used to identify participants. For example, when researchers get access to medical datasets, personal information like name and address will often have been removed. Even without such data, though, it may still be possible to identify people. When Latanya Sweeney was a graduate student at MIT in the mid-1990s, she suspected that if you knew a US citizen’s age, gender, and ZIP code, in many cases you could narrow it down to a single person. At the time, several medical databases included these three pieces of information. Combine them with an electoral register and Sweeney reckoned you could probably work out whose medical records you were looking at.[41]
So that’s what she did. ‘To test my hypothesis, I needed to look up someone in the data,’ she later recalled.[42] The state of Massachusetts had recently made ‘anonymised’ hospital records freely available to researchers. Although Governor William Weld had claimed the records still protected patients’ privacy, Sweeney’s analysis suggested otherwise. She paid $20 to access voter records for Cambridge, where Weld lived, then cross-referenced his age, gender, and ZIP code against the hospital dataset. She soon found his medical records, then mailed him a copy. The experiment – and the publicity it generated – would eventually lead to major changes in how health information is stored and shared in the US.[43]
As data spread from one computer to another, so do the resulting insights into people’s lives. It’s just not medical or genetic information we need to be careful with; even seemingly innocuous datasets can hold surprisingly personal details. In March 2014, a self-described ‘data junkie’ named Chris Whong used the Freedom of Information Act to request details of every yellow taxi ride in New York City during the previous year. When the New York City Taxi and Limousine Commission released the dataset, it included the time and location of the pick up and drop off, the fare, and how much each passenger tipped.[44] There were over 173 million trips in total. Rather than give the real licence plates, each taxi was identified by a string of apparently random digits. But it turned out the journeys were anything but anonymous. Three months after the dataset was released, computer scientist Vijay Pandurangan showed how to decipher the taxi codes, converting the scrambled digits back into the original licence plates. Then graduate student Anthony Tockar published a blog post explaining what else could be discovered. He’d found that with a few simple tricks, it was possible to extract a lot of sensitive information from the files.[45]
First, he showed how a person might stalk celebrities. After hours spent trawling through a search of images for ‘celebrities in taxis in Manhattan in 2013’, Tockar found several pictures with a licence plate in view. Cross-referencing these with celebrity blogs and magazines, he worked out what the start point or destination was, and matched this against the supposedly anonymous taxi dataset. He could also see how much celebrities had – or hadn’t –
tipped. ‘Now while this information is relatively benign, particularly a year down the line,’ Tockar wrote, ‘I have revealed information that was not previously in the public domain.’
Tockar acknowledged that most people might not be too worried about such analysis, so he decided to dig a little further. He turned his attention to a strip club in the Hell’s Kitchen neighbourhood, searching for taxi pick-ups in the early hours. He soon identified a frequent customer and tracked the person’s journey back to their home address. It didn’t take long to find them online and – after a quick search on social media – Tockar knew what the man looked like, how much his house was worth, and what his relationship status was. Tockar chose not to publish any of this information, but it wouldn’t have taken much effort for someone else to come to the same conclusions. ‘The potential consequences of this analysis cannot be overstated,’ Tockar noted.
With high-resolution GPS data, it can be extremely easy to identify people.[46] Our GPS tracks can easily reveal where we live, what route we take to work, what appointments we have, and who we meet. As with the New York Taxi data, it doesn’t take much to spot how such information could be a potential treasure trove for stalkers, burglars, or blackmailers. In a 2014 survey, 85 per cent of US domestic violence shelters said they were protecting people from abusers who’d stalked them via GPS.[47] Consumer GPS data can even put military operations at risk. During 2017, army staff wearing commercial fitness trackers inadvertently leaked the exact layout of bases when they uploaded their running and cycling routes.[48]
Despite these risks, the availability of movement data is also bringing valuable scientific insights, whether it’s allowing researchers to estimate where viruses might spread next, helping emergency teams support displaced populations after natural disasters, or showing planners how to improve city transport networks.[49] With high-resolution GPS data, it’s even becoming possible to analyse interactions between specific groups of people. For example, studies have used mobile phone data to track social segregation, political groupings and inequality in countries ranging from the United States to China.[50]
If that last sentence made you feel slightly uncomfortable, you wouldn’t be alone. As the availability of digital data increases, concerns about privacy are growing too. Issues like inequality are a major social challenge – and undoubtedly worthy of research – but there is intense debate about how far such research should delve into the details of our incomes, politics or social lives. When it comes to understanding human behaviour, we often have a decision to make: what is an acceptable price for knowledge?
Whenever my collaborators and I have worked on projects involving movement data, privacy has been hugely important to us. On the one hand, we want to collect the most useful data we possibly can, especially if it could help to protect communities against outbreaks. On the other, we need to protect the private lives of the individuals in those communities, even if this means limiting the information we collect or publish. For diseases like flu or measles, we face a particular challenge, because children – who are at high risk of infection – are also a vulnerable age group to be putting under surveillance.[51] There are plenty of studies that could tell us useful, interesting things about social behaviour, but would be difficult to justify given the potential infringement on privacy.
In the rare instances where we do go out and collect high-resolution GPS data, our study participants will have given consent and know that only our team will have access to their exact location. But not everyone has the same attitude to privacy. Imagine if your phone had been leaking GPS data continuously, without your knowledge, to companies you’ve never heard of. This is more likely than you might think. In recent years, a little-known network of GPS data brokers has emerged. These companies have been buying movement data from hundreds of apps that people have given GPS access, then selling this on to marketers, researchers and other groups.[52] Many users may have long forgotten they installed these apps – be it for fitness, weather forecasts or gaming – let alone agreed to constant tracking. In 2019, US journalist Joseph Cox reported that he’d paid a bounty hunter to track a phone using second-hand location data.[53] It had cost $300.
As location data becomes easier to access, it is also inspiring new types of crimes. Scammers have long used ‘phishing’ messages to trick customers into giving sensitive information. Now they are developing ‘spear phishing’ attacks, which incorporate user-specific data. In 2016, several residents of Pennsylvania, USA received e-mails asking them to pay a fine for a recent speeding offence. The e-mails correctly listed the speed and location of the person’s car. But they weren’t real. Police suspected that scammers had obtained leaked GPS data from an app, then used this to identify people who’d been travelling too fast on local roads.[54]
Although movement datasets are proving remarkably powerful, they do still have some limitations. Even with very detailed movement information, there is one type of interaction that is near impossible to measure. It’s an event that is brief, often invisible, and particularly elusive in the early stages of outbreak. It’s also one that has sparked some of the most notorious incidents in medical history.
The doctor checked into room 911 of Hong Kong’s Metropole Hotel at the end of a tiring week. Despite feeling unwell, he’d made the three-hour bus trip across from Southern China for his nephew’s wedding that weekend. He’d come down with a flu-like illness a few days earlier and hadn’t managed to shake it off. However, it was about to get much worse. Twenty-four hours later, he’d be in an intensive care unit. Within ten days, he would be dead.[55]
It was 21 February 2003, and the doctor was the first case of sars in Hong Kong. Eventually, there would be sixteen other sars cases linked to the Metropole: people who’d stayed in rooms opposite the doctor, beside him, or along the corridor. As the disease spread, there was an urgent need to understand the new virus causing it. Scientists didn’t even know basic information like the delay from infection to appearance of symptoms (i.e. the incubation period). With cases appearing across Southeast Asia, statistician Christl Donnelly and her colleagues at Imperial College London and in Hong Kong set out to estimate this crucial information.[56]
The problem with working out an incubation period is that we rarely see the actual moment of infection. We just see people showing up with symptoms later on. If we want to estimate the average incubation period, we therefore need to find people who could only have been infected during a specific period of time. For example, a businessman staying at the Metropole had overlapped with the Chinese doctor for a single day. He fell ill with sars six days later, so this delay must have been the incubation period for his infection. Donnelly and her colleagues tried to gather together other examples like this, but there weren’t that many. Of the 1,400 sars cases that had been reported in Hong Kong by the end of April, only 57 people had a clearly defined exposure to the virus. Put together, these examples suggested that sars had an average incubation period of about 6.4 days. The same method has since been used to estimate the incubation period for other new infections, including pandemic flu in 2009 and Ebola in 2014.[57]
Of course, there is another way to work out an incubation period: deliberately give someone the infection and see what happens. One of the most infamous examples of this approach occurred in New York City during the 1950s and 1960s. The Willowbrook State School, located on Staten Island, was home to over 6,000 children with intellectual disabilities. Overcrowded and filthy, the school had frequent outbreaks of hepatitis, which had led paediatrician Saul Krugman to set up a project to study the infection.[58] Working with collaborators Robert McCollum and Joan Giles, the research involved deliberately infecting children with hepatitis to understand how the infection developed and spread. As well as measuring the incubation period, the team discovered they were actually dealing with two different types of hepatitis virus. One type, which we now call hepatitis A, spread from person-to-person, whereas hepatitis B was blood-borne.
T
he research brought controversy as well as discoveries. In the early 1970s, criticism of the work grew, and the experiments were eventually halted. The study team argued that the project had been ethically sound: it had approval from several medical ethics boards, they’d obtained consent from childrens’ parents, and the poor conditions in the school meant that many of the children would have got the disease at some point anyway. Critics responded that, among other things, the consent forms had brushed over the details of what was involved and Krugman overstated the chances children would get infected naturally. ‘They were the most unethical medical experiments ever performed on children in the United States,’ claimed vaccine pioneer Maurice Hillman.[59]
This raises the question of what to do with such knowledge once it’s been obtained. Research papers from the Willowbrook study have been cited hundred of times, but not everyone agreed they should be acknowledged in this way. ‘Every new reference to the work of Krugman and Giles adds to its apparent ethical respectability, and in my view such references should stop, or at least be heavily qualified,’ wrote physician Stephen Goldby in a letter to The Lancet in 1971.[60]
There are many other examples of medical knowledge that has uncomfortable origins. In early nineteenth-century Britain, the growing number of medical schools created a massive demand for cadavers for use in anatomy classes. Faced with a limited legal supply, the criminal market stepped in; bodies were increasingly snatched from graveyards and sold to lecturers.[61] Yet it is experiments on the living that have proved the most shocking. During the Second World War, Nazi doctors deliberately infected patients at Auschwitz with diseases including typhus and cholera, to measure things like the incubation period.[62] After the war, the medical community created the Nuremberg Code, outlining a set of principles for ethical studies. Even so, the controversies would continue. Much of our understanding of typhoid comes from studies involving US prisoners in the 1950s and 1960s.[63] Then, of course, there was Willowbrook, which transformed our knowledge of hepatitis.