Soho cholera outbreak, 1854
By the time some of the largest Ebola treatment centres opened in late 2014, the outbreak was already slowing down, if not declining altogether.[68] Yet in some areas, control measures did coincide with a fall in cases. It’s therefore tricky to untangle the exact impact of these measures. Response teams often introduced several measures at once, from tracing infected contacts and encouraging changes in behaviour to opening treatment centres and conducting safe burials. What effect did international efforts actually have?
Using a mathematical model of Ebola transmission, our group estimated that the introduction of additional treatment beds – which isolated cases from the community and thereby reduced transmission – prevented around 60,000 Ebola cases in Sierra Leone between September 2014 and February 2015. In some districts, we found that the expansion of treatment centres could explain the entire outbreak decline; in other areas, there was evidence of an additional reduction in transmission in the community. This could have reflected other local and international control efforts, or perhaps changes in behaviour that were occurring anyway.[69]
Historical Ebola outbreaks have shown how important behaviour changes can be for outbreak control. When the first reported outbreak of Ebola started in the village of Yambuku, Zaire (now the Democratic Republic of the Congo) in 1976, the infection sparked in a small local hospital before spreading to the community. Based on archive data from the original outbreak investigation, my colleagues and I estimated that the transmission rate in the community declined sharply a few weeks into the outbreak.[70] Much of the decline came before the hospital closed and before the international teams arrived. ‘The communities where the outbreak continued to spread developed their own form of social distancing,’ recalled epidemiologist David Heymann, who was part of the investigation.[71] Without doubt, the international response to Ebola in late 2014 and early 2015 helped prevent cases in West Africa. But at the same time, foreign organisations should be cautious about claiming too much credit for the decline of such outbreaks.
Despite the challenges involved in producing forecasts, there is a large demand for them. Whether we’re looking at the spread of infectious diseases or crime, governments and other organisations need evidence to base their future policies on. So how can we improve outbreak forecasts?
Generally, we can trace problems with a forecast back to either the model itself or the data that goes into it. A good rule of thumb is that a mathematical model should be designed around the data available. If we don’t have data about the different transmission routes, for example, we should instead try to make simple but plausible assumptions about the overall spread. As well as making models easier to interpret, this approach also makes it easier to communicate what is unknown. Rather than grappling with a complex model full of hidden assumptions, people will be able to concentrate on the main processes, even if they’re not so familiar with modelling.
Outside my field, I’ve found that people generally respond to mathematical analysis in one of two ways. The first is with suspicion. This is understandable: if something is opaque and unfamiliar, our instinct can be to not trust it. As a result, the analysis will probably be ignored. The second kind of response is at the other extreme. Rather than ignore results, people may have too much faith in them. Opaque and difficult is seen as a good thing. I’ve often heard people suggest that a piece of maths is brilliant because nobody can understand it. In their view, complicated means clever. According to statistician George Box, it’s not just observers who can be seduced by mathematical analysis. ‘Statisticians, like artists, have the bad habit of falling in love with their models,’ he supposedly once said.[72]
We also need to think about the data we put into our analysis. Unlike scientific experiments, outbreaks are rarely designed: data can be messy and missing. In retrospect, we may be able to plot neat graphs with cases rising and falling, but in the middle of an outbreak we rarely have this sort of information. In December 2017, for example, our team worked with MSF to analyse an outbreak of diphtheria in refugee camps in Cox’s Bazar, Bangladesh. We received a new dataset each day. Because it took time for new cases to be reported, there were fewer recent cases in each of these datasets: if someone fell ill on a Monday, they generally wouldn’t show up in the data until Wednesday or Thursday. The epidemic was still going, but these delays made it look like it was almost over.[73]
Diphtheria outbreak in Cox’s Bazar Bangladesh, 2017–18. Each line shows the number of new cases on a given day, as reported in the database as it appeared on 9 December, 19 December and 8 January.
Data: Finger et al., 2019
Although outbreak data can be unreliable, it doesn’t mean it’s unusable. Imperfect data isn’t necessarily a problem if we know how it’s imperfect, and can adjust accordingly. For example, suppose your watch is an hour slow. If you aren’t aware of this, it will probably cause you problems. But if you know about the delay, you can make a mental adjustment and still be on time. Likewise, if we know the delay in reporting during an outbreak, we can adjust how we interpret the outbreak curve. Such ‘nowcasting’, which aims to understand the situation as it currently stands, is often necessary before forecasts can be made.
Our ability to nowcast will depend on the length of the delay and the quality of data available. Many infectious disease outbreaks last weeks or months, but other outbreaks can occur on much longer timescales. Take the so-called opioid epidemic in the US, in which a rising number of people are addicted to prescription painkillers, as well as illegal drugs like heroin. Drug overdoses are now the leading cause of death for Americans under the age of 55. As a result of these additional deaths, average life expectancy in the US declined three years running between 2015 and 2018. The last time that happened was the Second World War. Despite some aspects of the crisis being specific to the US, it isn’t the only area at risk; opioid use has also been on the rise in places like the UK, Australia and Canada.[74]
Unfortunately, it’s hard to track drug overdoses because it takes especially long to certify deaths as drug-related. Preliminary estimates for US overdose deaths in 2018 weren’t released until July 2019.[75] Although some local-level data is available sooner, it can take a long time to build up a national picture of the crisis. ‘We’re always looking backwards,’ said Rosalie Liccardo Pacula, a senior economist at the RAND Corporation, which specialises in public policy research. ‘We aren’t very good at being able to see what’s happening immediately.’[76]
The US opioid crisis has received substantial attention in the twenty-first century, but Hawre Jalal and colleagues at the University of Pittsburgh suggest that the problem goes back much further. When they looked at data between 1979 and 2016, they found that the number of overdose deaths in the US grew exponentially during this period, with the death rate doubling every ten years.[77] Even when they looked at the state rather than national level, they found the same growth pattern in many areas. The consistency of the growth pattern was surprising given how much drug use has changed over the decades. ‘This historical pattern of predictable growth for at least 38 years suggests that the current opioid epidemic may be a more recent manifestation of an ongoing longer-term process,’ the researchers noted. ‘This process may continue along this path for several more years into the future.’ [78]
Yet drug overdose deaths only show part of the picture. They don’t tell us about the events that led up to this point; a person’s initial misuse of drugs may have started years earlier. This time lag happens in most types of outbreak. When people come into contact with an infection, there is usually a delay between being exposed and observing the effects of that exposure. For example, during that 1976 Ebola outbreak in Yambuku, people who were exposed to the virus often took a few days to become ill. For infections that were fatal, there was then another week or so between the illness appearing and death. Depending on whether we look at illnesses or deaths, we get two slightly different impressions of the outbreak. If we fo
cus on newly ill Ebola cases, we’d say that the Yambuku outbreak peaked after six weeks; based on deaths, we’d put the peak a week later.
1976 Ebola outbreak in Yambuku
Data: Camacho et al., 2014
Both datasets are useful, but they’re not measuring quite the same thing. The tally of new Ebola cases tells us what is happening to susceptible people – specifically, how many are getting infected – whereas the number of deaths shows what is happening to people who already have the infection. After the first peak, the two curves go in opposite directions for a week or so: cases fall while deaths are still rising.
According to Pacula, drug epidemics can be divided into similar stages. In the early stage of an outbreak, the number of users increases, as new people are exposed to drugs. In the case of opioids, exposure often starts with a prescription. It might be tempting to simply blame patients for taking too much medication, or doctors for overprescribing. But we must also consider the pharmaceutical companies who market strong opioids directly to doctors. And insurance companies, who are often more likely to fund painkillers than alternatives like physiotherapy. Our modern lifestyles also play a role, with rising chronic pain associated with increases in obesity and office-based work.
One of the best ways to slow an epidemic in its early stages is to reduce the number of people who are susceptible. For drugs, this means improving education and awareness. ‘Education has been very important and very effective,’ said Pacula. Strategies that reduce the supply of drugs can also help early on. Given the multitude of drugs involved in the opioid epidemic, this means targeting all potential routes of exposure, rather than one specific medication.
Once the number of new users peaks, we enter the middle stage of a drug epidemic. At this point, there are still a lot of existing users, who may be progressing towards heavier drug use, and potentially moving on to illegal drugs as they lose their access to prescriptions. Providing treatment and preventing heavy use can be particularly effective at this stage. The aim here is to reduce the overall number of users, rather than just preventing new addictions.
In the final stage of a drug epidemic, the number of new and existing users is declining, but a group of heavy users remains. These are the people who are most at risk, having potentially switched from prescription opioids to cheaper drugs like heroin.[79] But it’s not as simple as cracking down on the illegal drug market in these later stages. The underlying problem of addiction is much deeper and wider than this. As Police Chief Paul Cell put it, ‘America can’t arrest its way out of the opioid epidemic’.[80] Nor is it just a matter of taking away access to prescription drugs. ‘There’s an addiction problem, and not just an opioid problem,’ Pacula said. ‘If you don’t provide treatment when you’re taking away the drug, you’re basically encouraging them to go to anything else.’ She pointed out that drug epidemics also come with a series of knock-on effects. ‘Even if we get the issue of misuse of opioids under control, we have some very concerning long term trends that we haven’t even started dealing with.’ One is the effect on drug users’ health. As people move from taking pills to injecting drugs, they face the risk of infections like hepatitis C and hiv. Then there is the wider social impact – on families, communities, and jobs – of having large numbers of people with drug addiction.
Because the success of different control strategies can vary between the three stages of a drug epidemic, it’s crucial to know what stage we’re currently in. In theory, it should be possible to work this out by estimating the annual numbers of new users, existing users, and heavy users. But the complexity of the opioid crisis – with its mix of prescription and illegal use, makes it very difficult to pick these things apart. There are some useful data sources – such as visits to emergency rooms and results of post-arrest drug tests – but this information has become harder to get hold of in recent years. We can’t draw a neat graph showing the different stages of drug use like we can for the Yambuku Ebola outbreak, because the data simply aren’t available. It’s a common problem in outbreak analysis: things that aren’t reported are by definition tough to analyse.
In the early stages of a disease outbreak, there are generally two main aims: to understand transmission and to control it. These goals are closely linked. If we improve our understanding of how something is spreading, we can come up with more effective control measures. We may be able to target interventions at high-risk groups, or identify other weak links in the chain of transmission.
The relationship works the other way too: control measures can influence our understanding of transmission. For diseases, as with drug use and gun violence, health centres often act as our windows onto the outbreak. It means that if health systems are weakened or overburdened, it can affect the quality of data coming in. During the Ebola epidemic in Liberia in August 2014, one dataset we were working with suggested that the number of new cases was leveling off in the capital Monrovia. At first this seemed like good news, but then we realised what was actually happening. The dataset was coming from a treatment unit that had reached capacity. The case reports hadn’t peaked because the outbreak was slowing down; they’d stopped because the unit couldn’t admit any more patients.
The interaction between understanding and control is also important in the world of crime and violence. If authorities want to know where crime is occurring, they generally have to rely on what’s being reported. When it comes to using models to predict crime, this can create problems. In 2016, statistician Kristian Lum and political scientist William Isaac published an example of how reporting might influence predictions.[81] They’d focused on drug use in Oakland, California. First they’d gathered data on drug arrests in 2010, and then plugged these into the PredPol algorithm, a popular tool for predictive policing in the US. Such algorithms are essentially translation devices, taking information about an individual or location and converting it into an estimate of crime risk. According to the developers of PredPol, their algorithm uses only three pieces of data to make predictions: the type of historical crime, the place it happened and when it happened. It doesn’t explicitly include any personal information – like race or gender – that could directly bias results against certain groups.
Using the PredPol algorithm, Lum and Isaac predicted where drug crimes would have been expected to occur in 2011. They also calculated the actual distribution of drug crimes that year – including those that went unreported – using data from the National Survey on Drug Use and Health. If the algorithm’s predictions were accurate, they would have expected it to flag up the areas where the crimes actually happened. But instead, it seemed to point mostly to areas where arrests had previously occurred. The pair noted that this could produce a feedback loop between understanding and controlling crime. ‘Because these predictions are likely to over-represent areas that were already known to police, officers become increasingly likely to patrol these same areas and observe new criminal acts that confirm their prior beliefs regarding the distributions of criminal activity.’[82]
Some people criticised the analysis, arguing that police didn’t use Predpol to predict drug crimes. However, Lum said that this is missing the wider point because the aim of predictive policing methods is to make decisions more objective. ‘The implicit argument is that you want to remove human bias from the system.’ If predictions reflect existing police behaviour, however, these biases will persist, hidden behind a veil of a supposedly objective algorithm. ‘When you’re training it with data that’s generated by the same system in which minority people are more likely to be arrested for the same behaviour, you’re just going to perpetuate those same issues,’ she said. ‘You have the same problems, but now filtered through this high-tech tool.’
Crime algorithms have more limitations than people might think. In 2013, researchers at RAND Corporation outlined four common myths about predictive policing.[83] The first was that a computer knows exactly what will happen in the future. ‘These algorithms predict the risk of future events, not th
e events themselves,’ they noted. The second myth was that a computer would do everything, from collecting relevant crime data to making appropriate recommendations. In reality, computers work best when they assist human analysis and decisions about policing, rather than replacing them entirely. The third myth was that police forces needed a high-powered model to make good predictions, whereas often the problem is getting hold of the right data. ‘Sometimes you have a dataset where the information you need to make the prediction just isn’t contained in that dataset,’ as Lum put it.
The final, and perhaps most persistent myth, was that accurate predictions automatically lead to reductions in crime. ‘Predictions, on their own, are just that – predictions,’ wrote the RAND team. ‘Actual decreases in crime require taking action based on those predictions.’ To control crime, agencies therefore need to focus on interventions and prevention rather than simply making predictions. This is true for other outbreaks too. According to Chris Whitty, now the Chief Medical Officer for England, the best mathematical models are not necessarily the ones that try to make an accurate forecast about the future. What matters is having analysis that can reveal gaps in our understanding of a situation. ‘They are generally most useful when they identify impacts of policy decisions which are not predictable by commonsense,’ Whitty has suggested. ‘The key is usually not that they are “right”, but that they provide an unpredicted insight.’[84]
In 2012, police in chicago introduced the ‘Strategic Subjects List’ (SSL) to predict who might be involved in a shooting. The project was partly inspired by Andrew Papachristos’s work on social networks and gun violence in the city, although Papachristos has distanced himself from the SSL.[85] The list itself is based on an algorithm that calculates risk scores for certain city inhabitants. According to its developers, the SSL does not explicitly include factors like gender, race or location. For several years, though, it wasn’t clear what did go into it. After pressure from the Chicago Sun-Times, the Chicago Police Department finally released the SSL data in 2017. The dataset contained the information that went into the algorithm – like age, gang affiliations, and prior arrests – as well as the corresponding risk scores it produced. Researchers were positive about the move. ‘It’s incredibly rare – and valuable – to see the public release of the underlying data for a predictive policing system,’ noted Brianna Posadas, a fellow with the social justice organisation Upturn.[86]
The Rules of Contagion Page 15