by Luke Dormehl
Imagine, for example, that there are arrests for assault every Saturday night outside your local pub, the White Hart. If that proves to be the case, it wouldn’t be too difficult to predict that future Saturday nights will see similar behavior at that location, and that stationing a police officer on the door at closing time could be enough to prevent future fights from breaking out.
It was this insight that prompted Chief Bratton to ask Sean Malinowski to help him.
On Bratton’s advice, Malinowski started driving over to the University of California, Los Angeles every Friday afternoon to meet with members of the math and computer science departments. The Los Angeles Police Department had agreed to hand over its impressive data set of crime statistics—which amounted to approximately 13 million crime incidents recorded over an 80-year period—for the largest study of its kind. Malinowski relished the experience of working with the UCLA researchers. As had happened a decade earlier when he first started working with police on his drunk-driving campaign, he found himself being drawn into the work being done by the computer scientists as they combed through the data looking for patterns and, hopefully, formulas.
“I loved those days,” Malinowski recalls. Of particular interest to him was the work of George Mohler, a young mathematician and computer scientist in his mid-twenties, who was busy working on an algorithm designed to predict the aftereffects of earthquakes. Mohler’s work was more relevant than it might initially sound. In the same way that earthquakes produce aftershocks, so too does crime. In the immediate aftermath of a house burglary or car theft, that particular location becomes between 4 and 12 times more likely to be the scene of a similar crime event. This is a type of contagious behavior that is known as the “near repeat” effect. “Often a burglar will return to the same house or a neighboring house a week later and commit another burglary,” Mohler explains. Taking some of Mohler’s conclusions about earthquakes—and with help from an anthropologist named Jeff Brantingham and a criminologist named George Tita—the team of UCLA researchers were able to create a crime prediction algorithm that divided the city into different “boxes” of around 0.15 kilometers squared, and then ranked each of these boxes in order of the likelihood of a crime taking place.
A three-month randomized study using the algorithm began in November 2011. “Today . . . is historic,” began Malinowski’s address in that day’s Patrol Alert. His division, known as Foothill, covered seven main beats: La Tuna Canyon, Lake View Terrace, Pacoima, Shadow Hills, Sun Valley, Sunland and Tujunga. When divided up, these amounted to 5,200 boxes in total. At the start of that day’s roll call, Foothill patrol officers were handed individual mission maps, each with one or more boxes highlighted. These were the locations deemed as “high probability” and were accompanied by statistical predictions about the type of crime that was likely to occur there. “What we’re asking you to do is to use your available time to get into those boxes and look around for people or conditions that indicate that a crime is likely to occur,” Malinowski said, addressing his team. “Then take enforcement or preventative action to stop it.”
The experiment ran until February the following year. In March, the results were evaluated, and a decision was made about whether or not to roll out the technology. Findings were impressive. During the trial, Foothill had seen a 36 percent drop in its crime rate. On days where the algorithm was dictating patrols, predictions about which crimes were likely to take place were twice as accurate as those made by a human analyst. “Part of the reason is that human brains are not suited to rank 20 hotspots across a city,” George Mohler says. “Maybe they can give you the top one or two, but at seven or eight they’re just giving you random guesses.”
If there was a teething problem to all of this, it often came from Malinowski’s own men. “You do run into people who say they don’t need a computer to tell them where the crime is,” he admits. “A lot of guys try to resist it. You show them the forecasts and they say, ‘I could have told you that. I could have told you that the corner of Van Nuys and Glenoaks has always been a problem.’ I say, ‘That’s always been our problem, huh? How long have you been working here?’ They go, ‘Ten years I’ve been working this spot.’ And I say, ‘Then why the hell is it still a problem if you’ve known about it for ten years? Get out there and fix it.’”
Following the Foothill study, algorithmic policing was made available to all Los Angeles precincts. Similar algorithms have now been adopted by other police departments around the United States. Malinowski says that he still feels responsible for his officers but is getting used to his less hands-on role in their deployment. “You have to give up a bit of control to let the algorithm do its work,” he notes. Chief Bratton, meanwhile, retired from the Los Angeles Police Department. Following the 2011 England riots, he was approached by David Cameron about coming to the UK as Commissioner of London’s Metropolitan Police Service. The offer was ultimately vetoed on the basis that Bratton is not a British citizen. Instead, he was offered an advisory role on controlling violence, which he gladly accepted.5
The UCLA team has since raised several million dollars in venture funding and spun their algorithm out as a private company, which they named PredPol.6 In December 2012, PredPol made it to England, with a four-month, $200,000 trial taking place in Medway, Kent. In that case, the algorithm was credited with a 6 percent fall in street violence. Similar schemes have now taken place in Greater Manchester, West Yorkshire and the Midlands with similarly promising results.7 Although some local councillors were worried, believing that predictive analytics would leave rural areas without police cover, or else lead to job cuts, others felt the software was innovative and could bring about a more effective use of resources.8
As Malinowski says, predictive policing isn’t simply a matter of catching criminals. “What we’re trying to do is to be in the right place at the right time, so that when the bad guy shows up he sees the police and is deterred from committing a crime.” In the end it all comes back to supermarkets. “We’re like a greeter in Walmart,” Malinowski says. “It just puts people on notice that you’re looking at them.”
The Moral Statisticians
The idea of integrating statistics into the world of criminology might seem new. In fact, its roots go back to 19th-century France and to two men named André-Michel Guerry and Adolphe Quetelet. Both Guerry and Quetelet were talented statisticians, who came to the field after first pursuing other careers. In Guerry’s case this had been law. For Quetelet, astronomy. Each was profoundly influenced by the work of a man named Isidore Marie Auguste François Xavier Comte—better known as Auguste Comte. Between 1817 and 1823, Comte had worked on a manuscript entitled Plan of the Scientific Operations Necessary for the Reorganization of Society. In it he argued that the ideal method for determining how best to run a society would come from studying it in the manner of the natural sciences. In the same way that Isaac Newton could formulate how physical forces might impact upon an object, so did Comte posit that social scientists should be able to discover the universal laws of “social physics” that would predict human behavior. 9
This idea appealed immensely to Guerry and Quetelet, who had a shared interest in subjects like criminology. At the age of just 26, Guerry had been hired by the French Ministry of Justice to work in a new field called “moral statistics.” Quetelet, meanwhile, was enthusiastic about the opportunity to take the mathematical tools of astronomy and apply them to social data. To him:
The possibility of establishing moral statistics, and deducing instructive and useful consequences therefrom, depends entirely on this fundamental fact, that man’s free choice disappears, and remains without sensible effect, when the observations are extended over a great number of individuals.10
Of benefit to Guerry and Quetelet was the fact that each was living through what can be described as the first “golden age” of Big Data. From 1825, the Ministry of Justice had ordered the creation of the first centralized, n
ational system of crime reporting—to be collected every three months from each region of the country, and which recorded all criminal charges brought before French courts. These broke crimes down into the category of charge, the sex and occupation of the accused, and the eventual outcome in court. Other nationally held data sets included statistics concerning individual wealth (indicated through taxation), levels of entrepreneurship (measured through number of patents filed), the percentage of military troops who could both read and write, immigration and age distribution around the country, and even detailed lists of Parisian prostitutes—ordered by year and place of birth.11
During the late 1820s and early 1830s, Guerry and Quetelet worked independently to analyze the available data. One of the first things each remarked upon was the lack of variance that existed in crime from year to year. This had particular relevance in the field of social reform—since reformers had previously focused on redeeming the individual criminal, rather than viewing them as symptoms of a larger problem.12 Quetelet referred to “the terrifying exactitude with which crimes reproduce themselves” and observed that this consistency carried over even to a granular scale—meaning that the proportion of murders committed by gun, sword, knife, cane, stones, fire, strangulation, drowning, kicks, punches and miscellaneous instruments used for cutting and stabbing remained almost entirely stable on an annual basis. “We know in advance,” he proclaimed, “how many individuals will dirty their hands with the blood of others; how many will be forgers; how many poisoners—nearly as well as one can enumerate in advance the births and deaths that must take place.’’ Guerry, too, was struck by “this fixity, this constancy in the reproduction of facts,” in which he saw ample evidence that Comte’s theories of social physics were correct; that amid the noise of unfiltered data there emitted the dim glow of a signal.
A number of fascinating tidbits emerged from the study of the two scholars. For instance, Quetelet noticed a higher than usual correlation when examining the relationship between suicide by hanging and marriages involving a woman in her twenties and a man in his sixties. Not to be outdone, Guerry also turned his attention to suicide (subdivided by motive and method for ending one’s life) and concluded that younger men favored death by pistol, while older males tended toward hanging.
Other relationships proved more complex. Previously, it had been widely thought that poverty was the biggest cause of crime, which meant that wealthier regions of the country would surely have a lower crime rate than poorer ones. In fact, Guerry and Quetelet demonstrated that this was not necessarily the case. While the wealthiest regions of France certainly had lower rates of violent crime than did poorer regions, they also experienced far higher rates of property crime. From this, Guerry was able to suggest that poverty itself was not the cause of property crime. Instead, he pointed to opportunity as the culprit, and argued that in wealthier areas there is more available to steal. Quetelet built on this notion by suggesting the idea of “relative poverty”—meaning that great inequality between poverty and wealth in the same area played a key role in both property and violent crimes. To Quetelet, relative poverty incited people to commit crimes through envy. This was especially true in places where changing economic conditions meant the impoverishment of some, while at the same time allowing others to retain (or even expand) their wealth. Quetelet found less crime in poor areas than in wealthier areas, so long as the people in the poor areas were able to satisfy their basic needs.
Guerry published his findings in a slim 1832 volume called Essai sur la statistique morale de la France (Essay on the Moral Statistics of France). Quetelet followed with his own Sur l’homme et le développement de ses facultés (On Man and the Development of His Faculties) three years later. Both works proved immediately sensational: rare instances in which the conclusions of a previously obscure branch of academia truly captures the popular imagination. Guerry and Quetelet were translated into a number of different languages and widely reviewed. The Westminster Review—an English magazine founded by Utilitarians John Stuart Mill and Jeremy Bentham—devoted a particularly large amount of space to Guerry’s book, which it praised for being of “substantial interest and importance.” Charles Darwin read Quetelet’s work, as did Fyodor Dostoyevsky (twice), while no less a social reformer than Florence Nightingale based her statistical methods upon his own.13 Nightingale later gushingly credited Quetelet’s findings with “teaching us . . . the laws by which our Moral Progress is to be attained.”14
In all, Guerry and Quetelet’s work showed that human beings were beginning to be understood—not as free-willed, self-determining creatures able to do anything that they wanted, but as beings whose actions were determined by biological and cultural factors.
In other words, they were beings that could be predicted.
The Real Minority Report
In 2002, the Steven Spielberg movie Minority Report was released. Starring Tom Cruise and based on a short story by science-fiction author Philip K. Dick, the film tells the story of a futuristic world in which crime has been all but wiped out. This is the result of a specialized “PreCrime” police department, which uses predictions made by three psychics (“precogs”) to apprehend potential criminals based on foreknowledge of the crimes they are about to commit.
The advantage of such a PreCrime unit is clear: in the world depicted by Minority Report, perpetrators can be arrested and charged as if they had committed a crime, even without the crime in question having to have actually taken place. These forecasts prove so uncannily accurate that at the start of the movie the audience is informed that Washington, D.C.—where the story is set—has remained murder-free for the past six years.
While Minority Report is clearly science fiction, like a lot of good sci-fi the world it takes place in is not a million miles away from our own. Even before Sean Malinowski’s predictive policing came along, law enforcement officials forecasted on a daily basis by deciding between what are considered “good risks” and “bad risks.” Every time a judge sets bail, for instance, he is determining the future likelihood that an individual will return for trial at a certain date. Search warrants are similar predictions that contraband will be found in a particular location. Whenever police officers intervene in domestic violence incidents, their job is to make forecasts about the likely course of that household over time—making an arrest if they feel the future risks are high enough to warrant their doing so. With each of these cases the question of accuracy comes down to both the quality of data being analyzed and the choice of metrics that the decision-making process is based on. With human fallibility being what is, however, this is easier said than done.
Parole hearings represent another example of forecasting. Using information about how a prisoner has behaved while incarcerated, their own plans for their future if released, and usually a psychiatrist’s predictions about whether or not they are likely to serve as a danger to the public, parole boards have the option of freeing prisoners prior to the completion of their maximum sentence. According to a 2010 study, though, the single most important factor in determining whether a prisoner is paroled or not may be nothing more scientific than the time of day that their hearing happens to take place. The unwitting participants in this particular study were eight parole judges in Israel. In a situation where entire days are spent reviewing parole applications (each of which lasts for an average of six minutes), the study’s author plotted the number of parole requests approved throughout the day. They discovered that parole approval rates peaked at 65 percent after each of the judge’s three meal breaks, and steadily declined in the time afterward—eventually hitting zero immediately prior to the next meal.15 The study suggests that when fatigue and hunger reach a certain point, judges are likely to revert to their default position of denying parole requests. Even though each of the judges would likely place the importance of facts, reason and objectivity over the rumbling of their stomachs, this illustrates the type of problem that rears its head when
decision-makers happen to be human.
Richard Berk relies on no such gut instinct. Professor of criminology and statistics at the Wharton School of the University of Pennsylvania, Berk is an expert in the field of computational criminology, a hybrid of criminology, computer science and applied mathematics. For the past decade, he has been working on an algorithm designed to forecast the likelihood of individuals committing violent crimes. In a sly nod to the world of “precogs” envisioned by Philip K. Dick, Berk calls his system “RealCog.” “We’re not in the world of Minority Report yet,” he says, “but there’s no question that we’re heading there.” With Berk’s RealCog algorithm, he can aid law enforcement officials in making a number of important decisions. “I can help parole boards decide who to release,” he says, rattling off the areas he can (and does) regularly advise on. “I can help probation and parole departments decide how best to supervise individuals; I can help judges determine the sentences that are appropriate; I can help departments of social services predict which of the families that are in their area will have children at a high risk of child abuse.”
In a previous life, Berk was a sociologist by trade. After receiving his bachelor’s in psychology from Yale, followed by a PhD from the Johns Hopkins University, he took a job working as an assistant professor of sociology at Northwestern. He regularly published articles on subjects like the best way to establish a rapport with deviant individuals, and how to bridge the gap between public institutions and people living in poor urban areas. Then he changed tack. “I got interested in statistics as a discipline and pretty much abandoned sociology,” he says. “I have not been in a sociology department for decades.” A self-described pragmatist, Berk saw the academic work around him producing great insights but doing very little to change the way that things actually worked. When he discovered the field of machine learning, it was a godsend. For the first time he was able to use large datasets, detailing more than 60,000 crimes, along with complex algorithms so that statistical tools could all but replace clinical judgment.