Book Read Free

The Rules of Contagion

Page 26

by Adam Kucharski


  Despite the sometimes horrific history of human experiments, studies involving deliberate infections are on the rise.[64] Around the world, volunteers are signing up for research involving malaria, influenza, dengue fever, and others. In 2019, there were dozens of such studies underway. Although some pathogens are simply too dangerous – Ebola is clearly out of the question – there are situations in which the social and scientific benefits of an infection experiment can outweigh a small risk to participants. Modern infection experiments have much stricter ethical guidelines, particularly when giving participants information and asking for their consent, but they must still strike this balance between benefit and risk. It’s a balancing act that is becoming increasingly prominent in other areas of life as well.

  8

  A spot of trouble

  Grenville clark had just about settled into his position as conference chair when someone handed him a folded note.[1] A lawyer by training, Clark had organised the conference to discuss the future of the newly formed United Nations and what it would mean for world peace. Sixty delegates had already arrived at the Princeton University venue, but there was one more person who wanted to join. The note in Clark’s hands came from Albert Einstein, who was based at the adjacent Institute for Advanced Studies.

  It was January 1946, and many in the physics community were haunted by their role in the recent atomic bombings of Hiroshima and Nagasaki.[2] Although Einstein was a long-time pacifist – and had opposed the bombings – his letter to President Roosevelt in 1939, warning of the potential for a Nazi atom bomb, had triggered the US nuclear programme.[3] During the Princeton conference, one attendee asked Einstein about humanity’s inability to manage new technology.[4] ‘Why is it that when the mind of man has stretched so far as to discover the structure of the atom we have been unable to devise the political means to keep the atom from destroying us?’ ‘That is simple, my friend,’ replied Einstein. ‘It is because politics is more difficult than physics.’

  Nuclear physics is one of the most prominent examples of a ‘dual-use technology’.[5] The research has brought huge scientific and social benefits, but it has also found extremely harmful uses. In the preceding chapters, we’ve met several other examples of technology that can have both a positive and negative use. Social media can connect us to old friends and useful new ideas. Yet it can also enable the spread of misinformation and other harmful content. Analysis of crime outbreaks can identify people who may be at risk, making it possible to interrupt transmission; it can also feed into biased policing algorithms that may over-target minority groups. Large-scale GPS data is revealing how to respond effectively to catastrophes, how to improve transport systems, and how new diseases might spread.[6] But it also risks leaking personal information without our knowledge, endangering our privacy and even our safety.

  In March 2018, the Observer newspaper reported that Cambridge Analytica had secretly gathered data from tens of millions of Facebook users, with the aim of building psychological profiles of US and British voters.[7] Although the effectiveness of such profiling has been disputed by statisticians,[8] the scandal eroded public trust in technology firms. According to software engineer – and ex-physicist – Yonatan Zunger, the story was a modern retelling of the ethical debates that had already occurred in fields like nuclear physics or medicine.[9] ‘The field of computer science, unlike other sciences, has not yet faced serious negative consequences for the work its practitioners do,’ he wrote at the time. As new technology appears, we mustn’t forget the lessons that researchers in other fields have already learned the hard way.

  When ‘big data’ became a popular buzzword in the early twenty-first century, the potential for multiple uses was a source of optimism. The hope was that data collected for one purpose could help tackle questions in other areas of life. A flagship example of this was Google Flu Trends (GFT).[10] By analysing the search patterns of millions of users, researchers suggested it would be possible to measure flu activity in real-time, rather than waiting a week or two for official US disease tallies to be published.[11] The initial version of GFT was announced in early 2009, with promising results. However, it didn’t take long for criticisms to emerge.

  The GFT project had three main limitations. First, the predictions didn’t always work that well. GFT had reproduced the seasonal winter flu peaks in the US between 2003 and 2008, but when the pandemic took off unexpectedly in spring 2009, GFT massively underestimated its size.[12] ‘The initial version of GFT was part flu detector, part winter detector,’ as one group of academics put it.[13]

  The second problem was that it wasn’t clear how the predictions were actually made. GFT was essentially an opaque machine; search data went in one end and predictions came out the other. Google didn’t make the raw data or methods available to the wider research community, so it wasn’t possible for others to pick apart the analysis and work out why the algorithm performed well in some situations but badly in others.

  Then there’s the final – and perhaps biggest – issue with GFT: it didn’t seem that ambitious. We get flu epidemics each winter because the virus evolves, making current vaccines less effective. Similarly, the main reason governments are so worried about a future pandemic flu virus is that we won’t have an effective vaccine against the new strain. In the event of a pandemic, it would take six months to develop one,[14] by which time the virus will have spread widely. To predict the shape of flu outbreaks, we need a better understanding of how viruses evolve, how people interact, and how populations build immunity.[15] Faced with this hugely challenging situation, GFT merely aimed to report flu activity a week or so earlier than it would have been otherwise. It was an interesting idea in terms of data analysis, but not a revolutionary one when it comes to tackling outbreaks.

  This is a common pitfall when researchers or companies talk about applying large datasets to wider aspects of life. The tendency is to assume that, because there is so much data, there must be other important questions it can answer. In effect, it becomes a solution in search of a problem.

  In late 2016, epidemiologist Caroline Buckee attended a tech fundraising event, pitching her work to Silicon Valley insiders. Buckee has a lot of experience of using technology to study outbreaks. In recent years, she has worked on several studies using GPS data to investigate malaria transmission. But she is also aware that such technology has its limitations. During the fundraising event, she became frustrated by the prevailing attitude that with enough money and coders, companies could solve the world’s health problems. ‘In a world where technology moguls are becoming major funders of research, we must not fall for the seductive idea that young, tech-savvy college grads can single-handedly fix public health on their computers,’ she wrote afterwards.[16]

  Many tech approaches are neither feasible nor sustainable. Buckee has pointed to many failed attempts at tech pilot studies or apps that hoped to ‘disrupt’ traditional methods. Then there’s the need to evaluate how well health measures actually work, rather than just assuming good ideas will emerge naturally like successful start ups. ‘Pandemic preparedness requires a long-term engagement with politically complex, multidimensional problems – not disruption,’ as she put it.

  Technology can still play a major role in modern outbreak analysis. Researchers routinely use mathematical models to help design control measures, smartphones to collect patient data, and pathogen sequences to track the spread of infection.[17] However, the biggest challenges are often practical rather than computational. Being able to gather and analyse data is one thing; spotting an outbreak and having the resources to do something about it is quite another. When Ebola caused its first major epidemic in 2014, transmission was centred on Sierra Leone, Liberia and Guinea, three countries that ranked among the world’s poorest. A second major epidemic would begin in 2018, when Ebola hit a conflict zone in the northeastern part of the Democratic Republic of the Congo; by July 2019, with 2,500 cases and rising, who would declare it a Public Health Emergency
of International Concern (pheic).[18] The global imbalance in health capacity even shows up in scientific terminology. The 2009 pandemic flu virus emerged in Mexico, but its official designation is ‘A/California/7/2009(H1N1)’, because that’s where a lab first identified the new virus.[19]

  These logistical challenges mean that research can struggle to keep up with new outbreaks. During 2015 and 2016, Zika spread widely, spurring researchers to plan large-scale clinical studies and vaccine trials.[20] But as soon as many of these studies were ready to start, the cases stopped. This is a common frustration in outbreak research; by the time the infections end, fundamental questions about contagion can remain unanswered. That’s why building long-term research capacity is essential. Although our research team has managed to generate a lot of data on the Zika outbreak in Fiji, we were only able to do this because we already happened to be there investigating dengue. Similarly, some of the best data on Zika have come from a long-running Nicaraguan dengue study led by Eva Harris at the University of California, Berkeley.[21]

  Researchers have also lagged behind outbreaks in other fields. Many studies of misinformation during the 2016 US election weren’t published until 2018 or 2019. Other research projects looking at election interference have struggled to get off the ground at all, while some are now impossible because social media companies – whether inadvertently or deliberately – have deleted the necessary data.[22] At the same time, fragmented and unreliable data sources are hindering research into banking crises, gun violence and opioid use.[23]

  Getting data is only part of the problem, though. Even the best outbreak data will have quirks and caveats, which can hinder analysis. In her work tracking radiation and cancer, Alice Stewart noted that epidemiologists rarely have the luxury of a perfect dataset. ‘You’re not looking for a spot of trouble against a spotless backdrop,’ she said,[24] ‘you’re looking for a spot of trouble in a very messy situation.’ The same issue crops up in many fields, whether trying to estimate the spread of obesity in friendship data, uncover patterns of drug use in the opioid epidemic, or trace the effects of information across different social media platforms. Our lives are messy and complicated, and so are the datasets they produce.

  If we want a better grasp of contagion, we need to account for its dynamic nature. That means tailoring our studies to different outbreaks, moving quickly to ensure our results are as useful as possible, and finding new ways to thread strands of information together. For example, disease researchers are now combining data on cases, human behaviour, population immunity, and pathogen evolution to investigate elusive outbreaks. Taken individually, each dataset has its own flaws, but together they can reveal a more complete picture of contagion. Describing such approaches, Caroline Buckee has quoted Virginia Woolf, who once said that ‘truth is only to be had by laying together many varieties of error.’[25]

  As well as improving the methods we use, we should also focus on the questions that really matter. Take social contagion. Considering the amount of data now available, our understanding of how ideas spread is still remarkably limited. One reason is that the outcomes we care about aren’t necessarily the ones that technology companies prioritise. Ultimately, they want users to interact with their products in a way that brings in advertising revenue. This is reflected in the way we talk about online contagion. We tend to focus on the metrics designed by social media companies (‘How do I get more likes? How do I get this post to go viral?’) rather than outcomes that will actually make us healthier, happier, or more successful.

  With modern computational tools, there is potential to get unprecedented insights into social behaviour, if we target the right questions. The irony, of course, is that the questions we care about are also the ones that are likely to lead to controversy. Recall that study looking at the spread of emotions on Facebook, in which researchers altered people’s News Feeds to show happier or sadder posts. Despite criticism of how this research was designed and carried out, the team was asking an important question: how does the content we see on social media affect our emotional state?

  Emotions and personality are, by their very definition, emotive and personal topics. In 2013, psychologist Michal Kosinski and his colleagues published a study suggesting that it was possible to predict personality traits – such as extroversion and intelligence – from the Facebook pages that people liked.[26] Cambridge Analytica would later use a similar idea to profile voters, triggering widespread criticism.[27] When Kosinski and his team first published their method, they were aware that it could have uncomfortable alternative uses. In their original paper, they even anticipated a possible backlash against technology firms. The researchers speculated that as people became more aware of what could be extracted from their data, some might turn away from digital technology entirely.

  If users are uncomfortable with exactly how their data is being used, researchers and companies have two options. One is to simply avoid telling them. Faced with concerns about privacy, many tech companies have downplayed the extent of data collection and analysis, fearing negative press coverage and uproar from users. Meanwhile, data brokers (who most of us have never heard of) have been making money selling data (which we weren’t aware they had) to external researchers (who we didn’t know were analysing it). In these cases, the assumption seems to have been that if you tell people what you’re doing with their data, they won’t let you do it. Thanks to new privacy laws like Europe’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act, some of these activities are becoming harder. But if research teams continue to brush over the ethics of their analysis, there will be further scandals and lapses in trust. Users will become more reluctant to share data, even for worthwhile studies, and researchers will shy away from the effort and controversy of analysing it.[28] As a result, our understanding of behaviour – and the social and health benefits that can come from such insights – will stagnate.

  The alternative option is to increase transparency. Instead of analysing people’s lives without their knowledge, let them weigh up the benefits and risks. Involve them in the debates; think in terms of permission rather than forgiveness. If social benefits are the aim, make the research a social effort. When the nhs announced their ‘Care.data’ scheme in 2013, the hope was that better data sharing could lead to better health research. Three years later, the scheme was cancelled after the public – and doctors – lost confidence in how the data were being used. In theory, Care.data could have been enormously beneficial, but patients didn’t seem to know about the scheme, or didn’t trust it.[29]

  Perhaps nobody would agree to data-intensive research if they knew what was really involved? In my experience, that’s not necessarily true. Over the past decade, my collaborators and I have run several ‘citizen science’ projects combining contagion research with wider discussions about outbreaks, data, and ethics. We’ve studied what networks of interactions look like, how social behaviour changes over time, and what this means for infection patterns.[30] Our most ambitious project was a massive data collection effort we ran in collaboration with the BBC during 2017/18.[31] We asked the public to download a smartphone app that tracked their movements to the nearest 1km over a day, and also asked them to tally up their social interactions. Once the study was completed, this dataset would help form a freely available resource for researchers. To our surprise, tens of thousands of people volunteered, despite the project having no immediate benefit to them. Although just one study, it shows that large-scale data analysis can still be carried out in a transparent and socially beneficial way.

  In March 2018, the BBC aired a program called Contagion!, showcasing the initial dataset we’d gathered. It wasn’t the only story about large-scale data collection in the media that week; a few days earlier, the Cambridge Analytica scandal had broken. Whereas we had asked people to volunteer their data to help researchers understand disease outbreaks, Cambridge Analytica had allegedly been harvesting vast quantities of Facebook data – wi
thout users’ knowledge – to help politicians try and influence voters.[32] Here were two studies of behaviour, two massive datasets, and two very different outcomes. Several commentators picked up on the contrast, including journalist Hugo Rifkind in his TV review for The Times. ‘In a week when we’ve agreed that data and internet surveillance – boo, hiss – are ruining the world, Contagion was a welcome reminder that it can sort of save it a bit too.’[33]

  In the time it’s taken you to read this book, around three hundred people will have died of malaria. There will have been over five hundred deaths from hiv/aids, and about eighty from measles, most of them children. Melioidosis, a bacterial infection that you may well have never heard of, will have killed more than sixty people.[34]

  Infectious diseases still cause vast damage worldwide. As well as known threats, we face the ever-present risk of a new pandemic, and the rising emergence of drug-resistant infections. However, as our knowledge of contagion has improved, infectious diseases have on the whole declined. The global death rate for such diseases has halved in the past two decades.[35]

  As infectious diseases wane, attention is gradually shifting to other threats, many of which can also be contagious. In 1950, tuberculosis was the leading cause of death for a British man in his thirties. Since the 1980s, it has been suicide.[36] In recent years, young adults in Chicago have been most likely to die from homi­cide.[37] Then there are the wider social burdens of contagion. When I analysed neknomination back in 2014, online transmission seemed like a tangential issue, almost a curiosity. Three years later, it was dominating front pages, with concerns about the spread of false information – and the role of social media – leading to multiple government investigations.[38]

 

‹ Prev