The Dubious Nature of “Race Blind” Predictive Algorithms in the Courtroom
The United States currently ranks first in the world for the rate of incarcerated individuals with an estimated 2.1 million people currently in prisons and jails across the nation. For reference, the Census Bureau estimates the U.S. population as third in the world at just under 333 million people. Home of the brave but for many, not a land of the free. These grim statistics are reflective of a flawed criminal justice system and the judicial process is a huge part of the problem— the consequences are particularly distressing for Black defendants and other marginalized groups. Mistakes in judgment lead to costly litigation, unwarranted incarceration and, in too many instances, the loss of an innocent defendant’s life.
Surprisingly, this is an issue everyone can agree is important. Americans across the aisle believe that the criminal justice system is due for a makeover. On the left, social justice movements have surged in the last two years demanding accountability for police brutality, for the discriminatory mass incarceration of Black individuals, for the emotional trauma suffered by defendants and their families, and for the economic strain placed on generations of children of the incarcerated. Meanwhile right-leaning politicians and organizations (for the most part) feel no pride for ranking alongside Russia and China with regards to incarceration statistics. Conservatives acknowledge the huge costs to taxpayers on a state-wide and federal level and most admit that “tough on crime” policies were a failure. Despite approaching the issue from opposing ideologies, the demand for criminal justice reform is shared by conservatives and liberals alike.
Institutions are growing increasingly dependent on technology and the legal system is no exception. One innovative (and controversial) form of tech in the criminal justice system is the use of predictive algorithms to assist with judicial decision making. The mechanisms attempt to predict the probability of a defendant repeating an offense and the likelihood of the defendant failing to appear in court. In the future, these tools could be extended to the post-sentencing process with a focus on defendants’ reentry into society (to assign individuals to housing programs and other rehabilitative programs). These algorithms act as a guide by providing over-worked, distracted and biased judges and prosecutors with decisions calculated using the same data humans must synthesize in order to reach a decision. The main benefit is that algorithms can’t be over-worked, distracted or intentionally biased. The controversy around such technology is that it can amplify existing societal and racial bias that is captured by the data.
Researchers from Harvard recently published a paper proposing two improvements to an existing risk assessment algorithm while successfully mitigating the effects of a historically racist and disparaging criminal justice system.
Crystal S. Yang and Will Dobbie argue for the use of race (and race-adjacent variables) as inputs in predictive algorithm tools. Using court data from New York City on arraignments documented from 2008 to 2013, the authors provide empirical evidence that the predictive algorithms currently used (which exclude race, and any variables related to ethnicity and socioeconomic status) to inform legal decisions lead to undesirable racial disparities. The authors define race neutrality as not using “information stemming from membership in a racial group to form predictions, either directly through the use of race itself or indirectly through the use of nonrace correlates”. Yang and Dobbie (2020) include race as a factor in the first level of estimation of the model and then omit individual-level race data in the actual decision-making step. They present two statistical models: a “color-blinding inputs” model and a “minorities-as-whites” model. The two approaches that show that their proposed model could reduce the number of Black defendants wrongfully detained while awaiting trial.
While there is no legal mandate prohibiting the use of race as an input variable, it is commonly seen as unconstitutional under the Equal Protection Clause of the 14th Amendment. Many legal scholars and policymakers argue that the use of race and factors correlated with race can lead to unfavorable pre-trial release decisions and discriminatory sentencing. To this the authors make a compelling point: it is nearly impossible to identify factors uncorrelated with race “given the empirical reality that almost every algorithmic input is likely correlated with race due to the influence of race in nearly every aspect of American life today” (Yang and Dobbie, 2020).
This Harvard paper is just one in a growing body of literature insisting that the inclusion of race and other protected characteristics in predictive algorithms is critical. A 2018 study by Kleinberg et al. shows that the inclusion of race and gender as characteristics in an algorithm aiding with college admissions decisions can increase both equity of representation in the student body as well as the efficiency of GPA prediction. Like Yang and Dobbie (2020), the authors argue that omitting race from the algorithm “inadvertently detracts from fairness.” Zliobaite and Custers (2016) similarly argue that the exclusion of race in a model (among other sensitive variables) leads to indirectly discriminatory results when factors that are frequently used to inform us about an individual, such as education and neighborhood/zip code, are correlated with race. Similarly, to Yang and Dobbie (2020), they propose training the model on a complete data set that includes sensitive/protected variables and then removing the variables from the second stage of the model and replacing the sensitive component with a constant. The final decision should be the output of this second “sanitized” model. Not seeing color is a cancellable offense in this age; it’s time to cancel algorithms that claim to be fair or neutral by simply ignoring the existence of race.
Several jurisdictions have already passed legislation mandating the use of these tools while others are starting to explore the changes that can be made using risk assessment tools (a goal proposed in the First Step Act approved under former President Trump’s watch). Legal organizations such as the American Bar Association have also begun to encourage the adoption of risk assessment tools in the courtroom. As their use becomes more widespread, it should be noted that the use of big data in the judicial process brings with it the dangers of reinforcing existing societal bias and removing accountability for flawed judgments. Models built on data collected from centuries of discriminatory decisions are just as fallible as the humans building them. Algorithms, however, render the possibility of improved transparency and for now, that might be more feasible than what we hope to achieve with humans.