Busted by Big Data

Algorithms could make cities safer—but they can't protect us from policing's worst instincts

Illustration of Man Running Towards a Large Technological Eye
Illustration by Matt Murphy

Boyle street is one of Edmonton’s oldest neighbourhoods. Sandwiched between a rail corridor and the North Saskatchewan River just east of the downtown core, the area is mostly unremarkable: a few vacant lots, some inexpensive apartments, and newcomers from around the world. It does, however, have a fairly high crime rate. On Edmonton’s online “crime map,” the neighbourhood is speckled with an assortment of coloured dots: red for robberies, green for break and enters, and cyan for assaults.

On a hot summer afternoon in 2015, about a dozen Boyle Street residents gathered in a local community centre to listen as police and city officials pitched a plan to tackle the seemingly Sisyphean task of putting an end to one category of offences: property crime. Car thefts, break and enters, and vandalism could soon be predicted and prevented, the representatives said, all thanks to a computer algorithm.

Staff explained how Edmonton’s chief analytics officer, Stephane Contre, and his colleague, Kris Andreychuk, had recently discovered that some of the area’s property crimes were correlated to seemingly unrelated features, including noise disturbances, broken street furniture, poor lighting, and the presence of a bar. They had compiled a list of 233 distinct environmental and geographical features within city limits and placed them on a giant digital map. Using two years of 911 calls and police data, they overlaid that map with the locations of crimes and disturbances and then looked at what combinations of features tended to show up in those areas. Some patterns stood out immediately: for example, in any area of Edmonton that had youth centres and noise complaints and abandoned automobiles, the predicted incidence of property crime rose to 100 percent. Parts of the Boyle Street area fit the formula precisely.

But what did it all mean? The analysts knew that the noise complaints and the youth centres and the abandoned cars weren’t causing the crimes, but those factors seemed to be hinting at something else driving criminal activity. As Andreychuk puts it, “These are simply clues.”

Toward the end of the meeting, residents were told that the abandoned cars had been stolen, and an elderly woman at the back piped up: “That makes complete sense.” The woman said from her perch in her high-rise apartment, she saw “dial-a-dopers” drive in to see customers and dump their vehicles in a cul-de-sac. In other words, the abandoned cars were simply a symptom of a bigger underlying problem: dealers coming to the area and stirring up trouble. Andreychuk was awed by how well computer analytics lined up with the old-fashioned observations of an attentive resident—effectively proving to him that the predictive capabilities of their homemade system worked. “It was a very romantic moment,” he recalls.

Emboldened by the system’s performance, Edmonton police devised a plan: the city deployed another social worker to the youth centre and an extra cop to walk that neighbourhood beat. Officials also beefed up parking restrictions and reconnected that cul-de-sac to the street grid so it wouldn’t be such a dead zone. Soon, noise complaints—and crime—had dropped. According to a follow-up evaluation, Edmonton’s Boyle Street experiment in predictive policing produced $1.60 in savings for every $1 spent (based on the foregone costs of arrests, 911 calls, and other associated expenditures). Versions of this data-driven project have since been deployed in other neighbourhoods across the city.

By combining huge tranches of data and highly sophisticated algorithms, predictive policing appears to hold out the science-fiction promise that technology could, one day, spit out 100 percent accurate prophecies concerning the location of future crimes. The latest iteration of these analytics can’t ID a killer-to-be, but it can offer insight into what areas are potential sites for crime by drawing on information in everything from historical records to live social-media posts.

Last July, the Vancouver Police Department became the first service in the country to use the technology across its entire operation—a move made following a successful six-month trial in 2016 that showed a reduction in break and enters by as much as 27 percent compared to recent averages. (A VPD spokesperson declined to reveal how much the force had paid for the system, citing “privacy and contractual” reasons.) Other municipalities, including Saskatoon and London, Ontario, are now running pilot projects or preliminary deployments of their own.

The technology, however, has raised tough questions about whether hidden biases in these systems will lead to even more over-policing of racialized and lower-income communities. In such cases, the result can turn into a feedback loop: the algorithms recommend a heightened police presence in response to elevated arrest rates that can be attributed to a heightened police presence.

Andrew Ferguson, who teaches law at the University of the District of Columbia and is the author of The Rise of Big Data Policing, goes further. He says that current predictive systems use social media and other deep wells of personal information to predict whether certain offenders may commit future crimes—an Orwellian scenario. Canadian governments and civilian oversight bodies, however, have done little to establish clear policies differentiating appropriate and inappropriate uses for these technologies. It is little wonder that critics are becoming increasingly concerned that police departments fitted out with big-data systems could use them to pre-emptively target members of the public. Can we really trust crime fighting to an algorithm?

While a crime may seem random to a victim, it is often the product of a particular set of circumstances: parking lots where some cars will be left unlocked, bars that disgorge drunken patrons, alleys or under-lit corners concealed from view. Police, therefore, have long been relentless counters. Department websites often contain a criminal box score—annual or year-to-date tallies of murders, stabbings, and burglaries. Cops also love visualizing this data. The old wall-sized precinct map, mounted in the briefing room and covered with push-pins, gave way to digital versions with more embedded information. But the idea remains the same: if you can see where crimes are being committed, your chances of being able to catch a perpetrator are better.

In the early 1990s, the New York Police Department created the CompStat system, which aimed to help refocus local law enforcement from functioning as a system that reacted to crimes into one that sought to prevent them from happening in the first place. CompStat allowed cops to put crime information—which, previously, had been sent off to the FBI but served little local operational use—into a massive database. CompStat would then generate “heat maps,” which vividly depicted hitherto invisible crime patterns, such as concentrations of particular types of incidents in certain locations. Officers would then use these heat maps to try to intuit where crimes would occur. As one report put it, “The past is prologue.”

After the 2007 financial crisis, many American municipalities were teetering on the brink of bankruptcy, and police departments had to drastically scale back spending. At about that time, Los Angeles Police Department officials were wondering if they could build on CompStat’s hot-spot analysis to actually anticipate crime. The force teamed up with Jeff Brantingham and George Mohler, a pair of California data scientists who were experimenting with new methods of forecasting property crime by adapting mathematical models developed to predict the frequency of earthquake aftershocks. To do this, Mohler and Brantingham took burglary data from three LA regions in the 2000s and generated estimates of the location and frequency of “follow-on” crime incidents, which were often second break-ins at the same or neighbouring houses. Surprisingly, those seismic formulas produced fairly accurate predictions for the secondary crimes that tended to occur.

Soon after they began publishing their findings, Mohler and Brantingham set up a real-world experiment with the Santa Cruz Police Department. Drawing on eight years of local crime data, the system they built divided the city into 150-square-metre cells, each with a crime forecast profile based on previous incidents. Police supervisors could prioritize their patrol efforts by focusing on the cells with the greatest predicted likelihood of break and enter activity. The result was a 27 percent reduction in burglaries. Law enforcement officials in other cities started studying the deployment of such systems in the wake of breathless media accounts of such technologies’ ability to make the right guesses. Within months, the LAPD had rolled out a version and reported that areas with the new predictive-policing system had seen crime drop by 13 percent, even as the city-wide stats trended upward.

Brantingham and Mohler realized that there was a huge potential market for the technology and set out to develop a robust system that could work with large amounts of data from all sorts of crimes. Their commercialized system, dubbed PredPol, raised $1 million from investors. Soon other companies started building additional predictive-policing wares, each with their own formulas, and tech giants including Microsoft and IBM entered the field.

Predictive-policing technology has now been adopted in other areas, including in Brazil, where the Igarapé Institute, a think tank, partnered with state authorities in the Rio de Janeiro region to launch a mapping-and-forecasting tool for crimes, including homicide. Robert Muggah, research director and cofounder of the Igarapé Institute, says the mapping algorithm, launched in 2016, works by using a historical crime database with 15 million incidents to forecast the probability of victimization within thousands of 250-square-metre blocks. Igarapé’s algorithm offered new insight into trends and showed that 90 percent of homicides occurred on or near just 2 percent of Rio’s intersections.

In London, Ontario, meanwhile, the local police service acquired a commercial predictive system in 2013. The technology has been used to get a better handle on break and enter activity around neighbourhoods with a lot of off-campus student rentals. With the system’s predictions, the London Police Service intensified its patrols at times when break and enters tended to spike. However, Detective Andrew Whitford, supervisor of crime analysis, cautions that analytics can only provide educated guesses to help officers make informed decisions. What really matters are the actions of the officers. In London, the force created better prevention programs geared at informing students and their families to take precautions. Two years in, the total value of stolen and damaged property in those areas had plunged. “These systems don’t give you the answer. They’re tools,” Whitford says. “You need to put them into the right hands.”

In 2016, Brian Rector, an executive director in Saskatchewan’s Ministry of Justice, was casting around for new approaches to deal with the factors that drive Indigenous youth to run away. He’d heard about US predictive-policing systems and wondered if they could be adapted to identify potential victims, instead of crimes. “Thirty percent of police activity deals with criminal offences, while 70 percent is social,” he explains. Social issues can refer to everything from calls concerning individuals with mental illness or addiction problems to calls about those who are homeless or vulnerable.

The Saskatoon Police Service had long had a poor relationship with Indigenous communities. In 1992, Shelley Napope disappeared. As Saskatoon police eventually discovered, the sixteen-year-old was a victim of John Crawford, a local serial killer. But police’s initial profile of Napope—an Indigenous teen officers believed was caught up in drugs and prostitution—meant that, even though her family made dozens of visits to the station to ask for help, her case didn’t merit an immediate response. Darlene Okemaysim-Sicotte, a Saskatoon-based activist whose cousin is Napope’s mother, says that the lack of police action was typical in many cases like Napope’s. “Ask any family,” she says. “They’ve had to report more than once to get action.”

The Saskatoon Police Service began overhauling its relationships with Indigenous residents after a 2003 inquiry, and part of this included changing how the force dealt with missing-persons reports. But even though it created a new policy to respond to each report immediately, there still was a problem: the service can see up to 2,700 cases each year, and the missing persons unit is made up of just three officers. “Do the math,” says psychologist Stephen Wormith, director of the University of Saskatoon’s Centre for Forensic Behavioural Science and Justice Studies. With so many cases, Wormith says, they still needed to triage.

Rector asked Wormith’s team to develop an algorithm that would help Saskatoon police find missing youth quickly, using predictions gleaned from analysis of historical data on previous disappearances, as well as other information from schools and welfare and children’s-aid case files. According to Daniel Anvari, a mathematician who worked on the lab’s algorithms, the system analyzes hundreds of variables in current and previous missing-persons cases for both youth and adults. So if a fifteen-year-old living in a foster home has been absent for two weeks and has gone missing before, the system can quickly scan earlier cases with similar facts, provide information on how those situations were resolved, and show where the teen turned up. The system is designed to rapidly generate leads that might otherwise take days for a cop riffling through case files to uncover. And in missing-persons cases, time is key. “The longer the person is missing,” Saskatoon deputy chief Jeff Bent explains, “the more likely they’ll get victimized.”

Rector’s ambitions for the lab go well beyond more-efficient police investigations into missing people. He sees the technology as a means of developing social policies that can forestall various types of violence and victimization. The data, he says, can identify early warning signs—such as repeated incidents of self-harm in the months prior to a teen running away—and offer officials a better understanding of the particular conditions confronting young people who flee repeatedly.

Saskatoon isn’t the only jurisdiction experimenting with predictive algorithms aimed at anticipating victimization. In Greater Pittsburgh, the county’s children-and-family-services department has deployed an algorithm that can examine household occupants’ previous criminal and medical activity to help screeners make risk assessments on calls coming into a child-abuse hotline. Caseworkers can also rapidly access historical data about a family’s situation that used to be spread across multiple databases. The system, according to an eighteen-month New York Times investigation, is intended to allow workers to focus on the most precarious cases, which wasn’t always happening in the past. The two academics who developed the algorithm reviewed the outcomes of almost 80,000 earlier cases recorded between 2010 and 2014 and found that social workers had mistakenly screened out a large number of children who ended up being killed or grievously injured by a parent. “You can’t believe the amount of subjectivity that goes into child-protection decisions,” Rachel Berger, a physician and child-abuse expert, told the Times. “That’s why I love predictive analytics. It’s finally bringing some objectivity and science to decisions that can be so unbelievably life-changing.”

T here may be promise in predictive policing, but the rapid evolution of the technologies has led a number of civil libertarians and academics to raise grave reservations about its usage. According to Teresa Scassa, a University of Ottawa professor and the Canada Research Chair in Information Law and Policy, the technology is “moving much faster than government,” and safeguards surrounding privacy and civil liberties have not been able to keep up. After all, it’s one thing if Amazon’s algorithms incorrectly predict what kind of book a person is interested in, but it’s quite another if a neighbourhood becomes the subject of over-policing because of an opaque, and potentially flawed, set of statistics.

While the Ontario government passed legislation in 2017 to restrict carding—a tactic where police would stop people, most often young black men, to demand ID and explanation for their whereabouts and, in some cases, to conduct street searches—departments still have the right to collect and store information about previous street checks in their databases, presumably for future use. These bits of information are now perfectly suited for predictive-policing applications and may potentially be fed into larger systems that combine disparate sources of personal data outside the criminal-justice system. As Andrew Ferguson explains, “Initial predictive-policing projects have raised the question of whether this data-driven focus serves merely to enable, or even justify, a high-tech version of racial profiling.”

Indeed, some experts say governments and the courts must tackle “algorithmic bias” before law enforcement agencies make further investments into these technologies. A 2016 investigation by ProPublica into bail and sentencing decisions in the US found that algorithms used to gauge risk of reoffending when setting parole terms are far more likely to label black defendants, rather than white defendants, high risk. In Canada, the Supreme Court is currently weighing a case in which an Indigenous man convicted of murder has alleged that anti-Indigenous biases have found their way into sentencing tests used by the judicial system.

According to Brenda McPhail, the Canadian Civil Liberties Association’s director of privacy, technology, and surveillance, much of today’s crime data is distorted by racial and class bias. Building a forecasting algorithm around historical 911 calls, carding information, arrests, and convictions may predict increased likelihood of crime in certain areas simply because they’ve been over-policed in the past. A few US jurisdictions, including Oakland, have rejected predictive policing specifically because of this concern and because of worries about how it could further isolate officers from the communities they serve.

Ferguson adds that if police officials rely too heavily on prediction generators, they run the risk of overlooking crimes such as fraud, sexual assault, and domestic violence that are taking place in communities that are less likely to be identified as risky by these algorithms. “There are other crimes that get undervalued because we’re not counting them,” he says. “It creates a hierarchy within the criminal-justice system.”

The quality of the data isn’t the only source of concern. Ferguson conducted a review of predictive-policing pilot projects in the US and found that early positive results were used to justify further investment but have proven difficult to replicate in other locales: a prediction engine that works well in the low-rise sprawl of LA may not work at all in the higher-density parts of Manhattan, he notes.

Yet, as with so many artificial-intelligence technologies now storming into markets, the hype has proven difficult to resist, especially for overstretched police forces. Predictive-policing firms dutifully say they’re not actually selling a real-life version of the system envisioned in the sci-fi thriller Minority Report, where crimes are uncannily prophesized before they occur and the criminals arrested before they can carry them out. Yet the technology’s implicit promise is that powerful computers can be trained to effectively do a significant amount of officers’ work for them, and it will still be a long time until we know if that will ever be the case.

Sarah brayne, a Canadian sociologist at the University of Texas at Austin, recently spent two-and-a-half years embedded with the LAPD. While there, she observed how its cops had come to rely on intelligence aggregated by a secretive company called Palantir. The firm, established by PayPal cofounder and early Facebook investor Peter Thiel, integrates huge sets of data—everything from licence-plate registries to addresses linked to cell phones to warrants and “unstructured data,” such as intercepted emails—for military, security, and law enforcement uses. Users simply search a name and receive an unprecedented look into the lives of private citizens, with no judicial oversight. “You’re able to build up a case with historical data without having a search warrant,” Brayne explains.

Brayne says that, during her time at the LAPD, she was witnessing the birth of “big-data surveillance,” which is developing parallel to predictive-policing technology. “Mathematized police practices serve to place individuals already under suspicion under new and deeper forms of surveillance, while appearing to be objective,” she observed in a 2017 paper recounting her experiences. In conversation, she describes it another way: “In the age of big data, law enforcement is starting to look more like intelligence gathering.”

Brayne recalls an encounter that revealed how potentially problematic these technologies can be: one algorithm showed, among other things, how many times an individual’s name had been searched previously. When she asked one detective why this kind of information was significant, he replied that the number of queries about an individual can be read as a proxy for their suspiciousness.

Perhaps this family of rapidly evolving algorithms offers a reminder about the old saying that when all you have is a hammer, every problem looks like a nail. Predictive policing and big data may be a new set of tools, but as critics point out, they are only as effective as the people who use them. “The problem with predictive policing,” Ferguson muses, “is the policing part.”

John Lorinc
John Lorinc is a Toronto-based journalist and editor and the author of Dream States: Smart Cities, Technology, and the Pursuit of Urban Utopias, published this year.
Matt Murphy
Matt Murphy is a United Kingdom-based illustrator who has produced work for The New York Times Magazine, the Guardian, and the Washington Post.