March 3, 2020

Making the case for algorithms to help with criminal justice decision-making

This new Washington Post piece by a group of California professors and data scientists, headlined "In the U.S. criminal justice system, algorithms help officials make better decisions, our research finds," makes a notable case for using algorithms in criminal justice decision-making.  Here are excerpts:

Should an algorithm help make decisions about whom to release before trial, whom to release from prison on parole or who receives rehabilitative services?  They’re already informing criminal justice decisions around the United States and the world and have become the subject of heated public debate.  Many such algorithms rely on patterns from historical data to assess each person’s risk of missing their next court hearing or being convicted of a new offense.

More than 60 years of research suggests that statistical algorithms are better than unaided human judgment at predicting such outcomes.  In 2018, that body of research was questioned by a high-profile study published in the journal Science Advances, which found that humans and algorithms were about equally as good at assessing who will reoffend. But when we attempted to replicate and extend that recent study, we found something different: Algorithms were substantially better than humans when used in conditions that approximate real-world criminal justice proceedings....

Surprised by the finding, we redid and extended the Dartmouth study with about 600 participants similarly recruited online.  This past month, we published our results.  The Dartmouth findings do not hold in settings that are closer to real criminal justice situations

The problem isn’t that the Dartmouth study’s specific results are wrong. We got very similar results when we reran the study by asking our own participants to read and rate the same defendant descriptions that their researchers used. It’s that their results are limited to a narrow context. We repeated the experiment by asking our participants to read descriptions of several new sets of defendants and found that algorithms outperformed people in every case. For example, in one instance, algorithms correctly predicted which people would reoffend 71 percent of the time, while untrained recruits predicted correctly only 59 percent of the time — a 12 percentage point gap in accuracy.

This gap increased even further when we made the experiment closer to real-world conditions. After each question, the Dartmouth researchers told participants whether their prediction was correct — so we did that, too, in our initial experiments. As a result, those participants were able to immediately learn from their mistakes. But in real life, it can take months or years before criminal justice professionals discover which people have reoffended. So we redid our experiment several more times without this feedback. We found that the gap in accuracy between humans and algorithms doubled, from 12 to 24 percentage points. In other words, the gap increased when the experiment was more like what happens in the real world. In fact, in this case, where immediate feedback was no longer provided, our participants correctly rated only 47 percent of the vignettes they read — worse than simply flipping a coin.

Why was human performance so poor? Our participants significantly overestimated risk, believing that people would reoffend much more often than they actually did. In one iteration of our experiment, we explicitly and repeatedly told participants that only 29 percent of the people they were assessing ultimately reoffended, but our recruits still predicted that 48 percent would do so. In a courtroom, these “judges” might have incorrectly flagged many people as high risk who statistically posed little danger to public safety.

Humans were also worse than algorithms at exploiting additional information — something that criminal justice officials have in abundance. In yet another version of our experiment, we gave humans and algorithms detailed vignettes that included more than the five pieces of information provided about a defendant in the original Dartmouth study. The algorithms that had this additional information performed better than those that did not, but human performance did not improve.

Our results indicate that statistical algorithms can indeed outperform human predictions of whether people will commit new crimes. These findings are consistent with the findings of an extensive literature, including field studies, that show that algorithmic predictions are more accurate than those of unaided judges and correctional officers who make life-changing decisions every day.

I blogged about the prior study in this post, and here are some (of many, many) prior related posts on risk assessment tools:

March 3, 2020 at 01:54 PM | Permalink


