Risk Assessment Instruments as a Part of Bail Reform: Do they help or hurt?
The discussion around pretrial risk assessment instruments (RAIs) in the U.S. has become increasingly politicized and polarized. Initially celebrated for their supposedly impartial and accurate risk predictions, RAIs have increasingly been employed during pretrial release decision-making by jurisdictions seeking to address racial bias, increase consistency and fairness of judicial decisions, end money bail, and/or reduce jail populations. However, many bail reform advocates who initially embraced RAIs as allies of these pretrial justice goals have since fully reversed their positions and now call for their abolition.
Given the very real impact pretrial decisions have on individuals’ lives, it is important to understand the true impact of RAIs before taking a stance in this politicized debate. A recently published review by Sarah Desmarais, John Monahan, and James Austin (Desmarais, Monahan, and Austin 2022) of the empirical research on RAIs seeks to synthesize findings around the three main concerns raised by opponents of these instruments: that they are inaccurate, that they make racially biased predictions, and that they lead to increased pretrial detention rates and greater racial disparity. The authors cite rigorous evidence refuting or adding nuance to each of these claims, revealing that the truth around RAIs is often more complex than opponents acknowledge.
Concern #1: Pretrial RAIs are Inaccurate
The authors conclude that the use of RAIs as a decision aid can actually improve the accuracy of pretrial release decisions. They argue that the accuracy of RAI predictions should be compared to the status-quo of unaided judicial decisions rather than perfection. When making pretrial release decisions, judges are required to consider an individual’s flight risk and public safety risk. Thus, most RAIs estimate the likelihood of failure to appear in court (FTA) and likelihood of rearrest in the pretrial period. When RAIs are not employed, judges must make these same predictions within the time and information constraints of a pretrial hearing to decide whether to grant pretrial release and what conditions to impose on release. The authors cite research spanning decades and domains showing that when it comes to predicting future behavior, including future violence and criminal behavior, unaided human judgments are less accurate than statistical predictions. Additionally, unaided human judgment, especially in time-constrained high-risk decision contexts, introduces racial bias. In the context of pretrial decisions in the U.S., racially biased prediction errors on the part of judges result in harsher bail decisions for Black defendants when compared to similar White defendants (Arnold, Dobbie, and Yang, 2018).
Human decisions are susceptible to a far greater range of irrelevant (i.e., accuracy-reducing), external influences than this paper conveys, with deeply unfair impacts on pretrial decisions and ultimately individuals’ lives. Beyond racial biases, judicial decision-making has been shown to be affected by increases in outdoor temperatures (Heyes and Saberian 2019), case sequencing (Chen, Moskowitz, and Shue 2016), and professional sports team losses (Eren and Mocan 2018). Judges also differ from one another in terms of leniency, which should raise serious questions about not only the fairness but also the accuracy of judicial pretrial decisions. Research on New York City, where cases are as good as randomly assigned to judges, showed that the most lenient quintile of judges released almost 22.3 percentage points more defendants than the least lenient quintile, (Kleinberg, Lakkaraju, et al. 2018). RAIs, meanwhile, are not subject to these influences and can introduce elements of both intra- and inter-judge consistency into the pretrial decision process.
Concern #2: Pretrial RAIs Make Racially Biased Predictions
Critics of pretrial RAIs claim that they exhibit predictive bias in favor of White defendants, meaning they more accurately predict pretrial behavior for White defendants than for defendants of color. This review shows, however, that the predictive validity of RAIs varies by instrument and jurisdiction, with some exhibiting lower rather than higher predictive accuracy for White defendants. The authors identify that much of the pushback to RAIs on the basis of predictive bias was spurred by a 2016 investigation by ProPublica (Angwin et al. 2016) of a RAI used in Florida, whose analysis and conclusions have since been shown to be flawed and misleading. It turns out that there are two ways of defining what constitutes a fair or unbiased prediction, and it is mathematically impossible to satisfy both fairness criteria simultaneously when the underlying rate of “reoffending” varies across groups (Corbett-Davies et al. 2016). The ProPublica analysis relies on just one of these definitions of fairness to make its case for bias. Meanwhile, the tool they analyze satisfies the other definition of fairness.
Critics of RAIs also point out that racial bias is built into the data fed to risk assessment algorithms, so predictions will surely be similarly biased. The authors recognize that policing, arrests, and charging decisions in the U.S. are racially biased, which will be reflected in official records of contact with law enforcement and the criminal justice system. However, they argue, in the absence of RAIs, decisions will be subject to similar bias since judges rely on the same official records and biased data as risk assessment algorithms to make their decisions. The advantage of a RAI over a judge, in this case, is that the RAI is more transparent in its consideration and weighting of these biased data points. After all, we can look at a risk assessment algorithm and see how it treats each variable, but we cannot look inside a judge’s brain to gain similar clarity on their thinking process.
Concern #3: RAI Use Increases Pretrial Detention Rates
Opponents of RAIs worry that their use leads judges to grant pretrial release less often and that this effect may be greater for defendants of color. The body of empirical evidence paints a more complicated picture. The authors identify several studies in which the use of RAIs to aid pretrial decisions was associated with lower, not higher, pretrial detention rates. Additionally, studies examining cases where pretrial detention rates did not decrease after a RAI was implemented have found that pretrial release rates would have increased if judges had more frequently adopted the RAI’s recommendations. On the topic of racially disparate impact, the authors highlight recent studies showing that even when RAIs do not achieve predictive parity (i.e., their predictive accuracy varies by race), their use can still increase pretrial release rates for everyone, regardless of race.
Recommendations
Based on the findings of their review, the authors support the continued use of pretrial RAIs and suggest three broad strategies to maximize their accuracy, unbiasedness, and transparency such that they best advance pretrial reform goals. Firstly, they recommend that RAIs be tested regularly for predictive validity (accuracy), interrater reliability, and differential prediction across protected classes (race, gender, etc.). If differential prediction, for instance by race, is found, they suggest limiting the use of variables that could be considered proxies for race, such as zip code. This last part of the recommendation, however, unwisely ignores literature showing that the explicit inclusion of race as a variable when building risk assessment algorithms can actually help eliminate harmful differential prediction across groups (Yang and Dobbie 2020), increasing accuracy and leading to more equitable outcomes (Kleinberg, Ludwig, et al. 2018).
Secondly, the authors advocate for fully disclosing RAI scoring and recommendations to defendants and allowing them to contest the accuracy of results. While most RAIs already publish their algorithms and supporting documentation, they argue that more must be done to ensure that defendants and their attorneys have ready access to this information. Additionally, the results themselves should be presented in a clear, understandable format. While their paper does not offer any specific suggestions for what constitutes an understandable format, a good example is the CJA Pretrial Release Assessment Report provided to New York City’s judges during pretrial hearings. The report was redesigned in 2019 to increase transparency, better explain the meaning behind risk scores, and provide clearly interpretable release recommendations. The new report clearly lays out not only an individual’s risk score and category but more importantly their likelihood of success, rather than failure, which in the context of NYC means their likelihood of appearing in court. It also lists the data used to generate the risk score, increasing transparency and allowing for judges to recalibrate a defendant’s score if new information comes to light at their arraignment.
Finally, the authors note that RAIs are meant to complement human decision making, not replace it as some opponents fear. RAIs are designed to help decision makers better identify low-risk individuals for pretrial release, but decisions are ultimately left to judges. Indeed, they note that judges often resist granting pretrial release yet most detained individuals are actually classified as low flight and community risk. Thus, these authors argue a presumption of release should exist for pretrial defendants.
The authors come out strongly in favor of the continued use of pretrial RAIs but recognize they must be part of a multipronged strategy. The recent case of New York City (Peterson 2020; “Updating the New York City Criminal Justice Agency Release Assessment” 2020), not covered in this review, provides insights into the promise and limitations of RAIs to decrease release rates and undo entrenched racial bias in pretrial release decisions. The city’s RAI was diligently redesigned to recommend release for the same percentage of Black and White defendants, meaning it achieved predictive parity, and to recommend release for significantly more individuals than its previous version, as shown in the first and third columns of the table below. After the update, release rates did increase across the board, but they remained lower than the RAI’s recommendations. Additionally, while the White-Black gap in release rates shrank, it still existed, indicating that judges continued to introduce racial bias into their decisions. The case of New York City clearly highlights a point that Desmarais, Monahan, and Austin emphatically make: “No one solution will be sufficient to fix our deeply flawed pretrial system. It is not realistic to expect pretrial risk assessment instruments–or any single strategy–to do so” (page 813).
Angwin, Julia, Jeff Larson, Surya Mattu, and Lauren Kirchner. 2016. “Machine Bias.” ProPublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
Arnold, David, Will Dobbie, and Crystal S Yang. “Racial Bias in Bail Decisions.” The Quarterly Journal of Economics 133, no. 4 (November 1, 2018): 1885–1932. https://doi.org/10.1093/qje/qjy012.
Chen, Daniel L., Tobias J. Moskowitz, and Kelly Shue. 2016. “Decision Making Under the Gambler’s Fallacy: Evidence from Asylum Judges, Loan Officers, and Baseball Umpires *.” The Quarterly Journal of Economics 131 (3): 1181–1242. https://doi.org/10.1093/qje/qjw017.
Corbett-Davies, Sam, Emma Pierson, Avi Feller, and Sharad Goel. “A Computer Program Used for Bail and Sentencing Decisions Was Labeled Biased against Blacks. It’s Actually Not That Clear.” Washington Post, October 17, 2016. https://www.washingtonpost.com/news/monkey-cage/wp/2016/10/17/can-an-algorithm-be-racist-our-analysis-is-more-cautious-than-propublicas/.
Desmarais, Sarah L., John Monahan, and James Austin. 2022. “The Empirical Case for Pretrial Risk Assessment Instruments.” Criminal Justice and Behavior 49 (6): 807–16. https://doi.org/10.1177/00938548211041651.
Eren, Ozkan, and Naci Mocan. 2018. “Emotional Judges and Unlucky Juveniles.” American Economic Journal: Applied Economics 10 (3): 171–205. https://doi.org/10.1257/app.20160390.
Heyes, Anthony, and Soodeh Saberian. 2019. “Temperature and Decisions: Evidence from 207,000 Court Cases.” American Economic Journal: Applied Economics 11 (2): 238–65. https://doi.org/10.1257/app.20170223.
Kleinberg, Jon, Himabindu Lakkaraju, Jure Leskovec, Jens Ludwig, and Sendhil Mullainathan. 2018. “Human Decisions and Machine Predictions.” Quarterly Journal of Economics 133 (1): 237–93. https://doi.org/10.1093/qje/qjx032.
Kleinberg, Jon, Jens Ludwig, Sendhil Mullainathan, and Ashesh Rambachan. 2018. “Algorithmic Fairness.” AEA Papers and Proceedings 108 (May): 22–27. https://doi.org/10.1257/pandp.20181018.
Peterson, Richard. 2020. “Brief No. 46: CJA’s Updated Release Assessment.” New York City Criminal Justice Agency. https://www.nycja.org/assets/downloads/CJA-Brief-46_updated-release-assessment.pdf.
“Updating the New York City Criminal Justice Agency Release Assessment.” 2020. Luminosity & the University of Chicago’s Crime Lab New York. https://www.nycja.org/assets/Updating-the-NYC-Criminal-Justice-Agency-Release-Assessment-Final-Report-June-2020.pdf.
Yang, Crystal, and Will Dobbie. 2020. “Equal Protection Under Algorithms: A New Statistical and Legal Framework.” Michigan Law Review, no. 119.2: 291. https://doi.org/10.36644/mlr.119.2.equal.