In technology, conventional wisdom says that machine learning can typically make better predictions than humans, weighing out biases and increasing accuracy by staggering amounts. A new study out of Dartmouth College, however, is challenging that assumption, particularly when it comes to the fate of those in the criminal justice system.

According to research from Dartmouth College, the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) risk management tool is no more accurate at predicting recidivism than individuals with “little or no criminal justice expertise.” COMPAS, widely used among U.S. courts to determine recidivism risk, has, according to the research, been used in assessing more than one million offenders since 1998.

Carried out by a student-faculty research team, the Dartmouth study gave a group of nonexperts—workers contracted through Amazon's Mechanical Turk online marketplace—short descriptions of pretrial defendants taken from a database in Broward County, Florida, from 2013-2014. The descriptions provided seven features of a pretrial defendant, including age, sex, crime they were charged with, whether the crime was a misdemeanor or felony, and previous criminal history. Using this information, participants, through a survey, were asked to predict whether a defendant would recidivate.

The research was conducted among a total of 800 participants, divided into two groups of 400. One group was allowed to see the pretrial defendant's race, while the other wasn't.

While COMPAS takes into account 137 features in determining recidivism, the tool's results (65.2 percent accuracy) in this instance were “statistically the same” as those of the group (67 percent), a statement from Dartmouth said.

“As machine learning and artificial intelligence tools emerged in criminal justice, they kind of bypassed this middle step in ensuring they're as accurate as we think they are,” Julia Dressel, who conducted the research for her undergraduate thesis in computer science at Dartmouth, told LTN. “People are quick to assume they're accurate and objective and think of course these things should be used. … We have to step back and realize that might not always be the case.”

“Right out of the gate, you know something is concerning when the accuracy is 65 percent,” Hany Farid, professor of computer science at Dartmouth College and co-leader of the study, told LTN. “People answering an online survey as accurately as the software: That should give us more pause.”

He added that many judges might look positively on using analytics tools because of their perceived accuracy, but “I think you would weigh that prediction very differently if I told you, 'Hey, I polled 12 people online, and this is what they said.'”

COMPAS's proprietary algorithm is unknown outside of its developer, Northpointe Inc. In 2017, The New York Times reported that a Northpointe executive said, “We've created [the algorithms], and we don't release them, because it's certainly a core piece of our business.”

The Dartmouth research took the seven pieces of information given to the study's human participants and fed it into “the simplest possible machine algorithm, the kind of thing you would teach in an undergraduate course,” logistic regression, and “it got 65 percent [accuracy], right out of the gate.”

Taking it a step further, the researchers gave the algorithm two pieces of information—age and prior convictions—and it achieved 65 percent accuracy, the same as COMPAS.

COMPAS has previously been challenged in the courts. In one case, a Wisconsin man was sentenced to six years in prison by a judge who cited a COMPAS assessment score. The man appealed, and the case made it to the Wisconsin Supreme Court, which ruled against him. In 2017, the U.S. Supreme Court declined to hear the case.

COMPAS is also no stranger to criticism. A 2016 analysis by ProPublica found “that black defendants were far more likely than white defendants to be incorrectly judged to be at a higher risk of recidivism.” Whites, conversely, were more likely “incorrectly flagged as low risk.”

The ranking of recidivism by race is partially due to limitations of algorithms. Researching the ProPublica dataset—that same used by Dartmouth—The Washington Post found that, while COMPAS doesn't account for race directly in its algorithm, many attributes it considers in predicting multiple-time offenders vary by race, like prior arrests, which black defendants are more likely to have.

Citing a different review, the Dartmouth study also noted that accuracy wasn't just an issue for COMPAS and that “eight out of nine [algorithmic] approaches failed to make accurate predictions.”