Extra Credit: Do AI-powered lending algorithms silently discriminate? This initiative aims to find out
Hello and welcome back to MarketWatch’s Extra Credit column, a weekly look at the news through the lens of debt.
Our system of providing credit has a well-documented history of discrimination that in many cases has made financing more expensive, predatory or non-existant, for non-white consumers.
For the past several years, financial technology, or fintech, companies, have been touting the potential of artificial intelligence and machine learning to help combat this problem. That promise rests on two main ideas. The first is that leaving a lending decision to an algorithm mitigates the bias that can come with human judgement. The second is that these algorithms have the power to spot good credit risks because they can suck in and process so much more data about an applicant than traditional formulas, which have discriminatory data baked into their design.
Some legal experts and computer scientists have been more wary. Just because something is a machine doesn’t mean it’s free of human biases, they say, as the use of machine learning and artificial intelligence in other areas illustrates. In the criminal justice sphere, for example, use of this type of technology was once seen as a way to reduce bias in sentencing, but now evidence indicates that the data it pulls in reproduces already present inequality.
One of the takeaways from that example, said David Rubenstein, a professor at Washburn University School of Law, is that “the use of AI systems won’t necessarily solve the problem and in fact can make it worse.”
“You launder biases from the past into the future, under the auspices of a neutral computer system and then you do it at scale because you can do so many more of these computations,” said Rubenstein, who studies AI regulation.
This week, we’re digging into the findings of a report that’s being used to work through these questions in the consumer lending context. Though companies and regulators evaluate lending algorithms to test for whether they’re discriminating, the methods they use are rarely public.
What makes the report released last week different is that it’s working through these thorny issues in documents that everyone can see. It’s the result of an agreement between Upstart
a consumer lending company, the Student Borrower Protection Center, a student loan borrower advocacy group, and the NAACP Legal Defense and Education Fund.
Relman Colfax was chosen as an independent monitor for the project, but before we get to what the civil rights firm found, a little background about how we got here.
Concerns about educational redlining
Last year, the Student Borrower Protection Center published a secret shopping exercise to get a sense of the impact of Upstart’s use of certain educational data in its lending decisions. For the past few years, the organization has been concerned about the implications of using factors, like where someone went to school, their standardized test score and their college major, when pricing a loan.
That’s because these attributes are often correlated with race and gender. Inequities in the K-12 school system and stratification in higher education mean that non-white and low-income students are more likely to end up at colleges with fewer resources to get them to and through school and into decent paying jobs.
Those outcomes combined with discrimination in the labor market increase the possibility that applicants who attended a historically Black college or university, or a minority serving institution could look like a bigger credit risk in models that use this kind of educational data. Students who attend these schools or who major in a lower paying field like education, are more likely to be non-white or women, respectively, groups the law prohibits financial institutions from discriminating against in lending decisions.
““You launder biases from the past into the future, under the auspices of a neutral computer system and then you do it at scale because you can do so many more of these computations”
— David Rubenstein, professor at the Washburn University School of Law
To test how these factors played out in Upstart’s model, the Student Borrower Protection Center created hypothetical applicants with the same characteristics, except where they went to school. Each of these applicants applied for a $30,000 student loan refinancing product through Upstart’s platform. The organization found that an applicant from Howard University, an HBCU, and an applicant from New Mexico State University, a Hispanic-serving institution, would pay a higher interest rate than an applicant who attended New York University.
At the time, Upstart officials took issue with the report’s methodology, describing it as “inaccurate and misleading.” They noted that the rate quotes were based on submitting the same individual’s credit report over a two-and-a-half month period, during which time their credit score changed. About half of the differences in the quotes could be explained by these changes, they said.
The Student Borrower Protection Center countered that changes in the applicant’s credit score didn’t take place during the report period and didn’t change the nature of its findings. (This back-and-forth between the two organizations is detailed in Relman Colfax’s first report on the monitoring agreement published in April).
The findings caught the attention of the Senate Committee on Banking, Housing and Urban Affairs, which asked Upstart to explain how it used educational data to make credit decisions. In its response letter, Upstart officials said factors like an applicant’s most recent school attended, their highest degree and area of study were among the more than 1,500 variables the company’s model considers. Upstart then placed the school into different groups based on certain data, including average incoming standardized test score, and passed that through the model.
That approach anonymized the schools, but it also sparked concern from some Senators, because non-white students are overrepresented in schools with lower standardized test scores, in part because of the correlation between standardized test scores, income and race. The concerned lawmakers wrote to the Consumer Financial Protection Bureau to look into whether these practices and practices by other lenders violated the Equal Credit Opportunity Act.
Ultimately, Upstart stopped using average incoming standardized test scores to group schools. A few months later, the company, the Student Borrower Protection Center and the Legal Defense Fund agreed to have a third-party monitor test Upstart’s model for fair lending concerns.
The monitor’s first detailed findings
That testing is ongoing, but last week, the monitor released its first detailed report on its findings so far.
Although some APR disparities existed, the monitor didn’t find practically significant differences in pricing between Black, white and Hispanic applicants or men and women. With regards to pricing, “the monitor confirms or found that whatever issues there may have been in the past, those issues don’t seem to exist,” said Matthew Bruckner, an associate professor at Howard University School of Law. “That’s really big.”
The report did find that there was a difference in approval ratings for Black and white applicants, — “less of a win for Upstart,” Bruckner said. These disparities were measured without controlling for legitimate creditworthiness criteria and, on its own, the difference doesn’t constitute a fair lending violation, according to the monitor’s report. But the disparities were both statistically and practically significant, the monitor found. That means that not only were the disparities not explained by chance — what statistical significance tests for in many contexts — but they were meaningful.
For example, it’s easy to imagine that a court or a regulator may not find a 1% difference in approval ratings between two groups to be meaningful enough to indicate that a model is having a disparate impact on one the groups. But as that difference widens it has more practical impact. Relman Colfax has established its own cutoff based on case law to determine when differences become practically significant.
The difference in approval between white and Black applicants was large enough to meet that threshold and to “trigger an obligation to investigate,” if there are less discriminatory alternatives to the model Upstart is currently using, the authors wrote in the monitor’s report.
Separately, the monitor also looked at whether variables in Upstart’s model are proxies for certain protected groups. Put another way, they were checking to see if the variables’ predictive power come solely or largely from a correlation with race or national origin.
What they found is that none of Upstart’s variables on their own have a high likelihood of functioning as proxies for race or national origin, and that all of them together don’t have a high likelihood of functioning as proxies for race or national origin either. What’s less clear is whether the variables interact with each other in Upstart’s model in a way where they function as proxies for protected groups. “We cannot eliminate the possibility that proxies exist,” the authors of the report wrote.
This finding pushed Relman Colfax to suggest that Upstart weigh the feasibility of using a model that’s easier to understand alongside the benefits of its current model, which could include the model’s performance and the flexibility of the structure to implement improvements on certain fairness metrics.
This challenge of balancing a model’s accuracy and interpretability is a key issue companies, regulators and other stakeholders are still sorting through. A more accurate model can result both in better profits for a lender and also could theoretically better identify credit-worthy consumers.
For these models, “we know what the inputs and the outputs are,” said Rubenstein. “The problem is that the inner logic of the model that turns inputs into outputs can be a black box because of their sheer complexity.”
In the fair lending context, lenders have to give reasons for why a particular loan was denied, but it’s not entirely clear how these complex models will meet that requirement for reason-giving, Rubenstein said.
“That’s very much an open and important question that I think the law will have to resolve at some point,” he said. “These types of studies undertaken by the monitor might be on a path at beginning to answer those questions.”
Agreement provides potential for industry-wide insight
Indeed, stakeholders view the approach of the Upstart agreement and its public-facing reports as one that could provide insight on these questions with the potential to be used industry-wide.
“The progress made under this agreement shows that all lenders should be transparent and rigorous about testing their models with independent third parties,” Mike Pierce, the executive director of the Student Borrower Protection Center, said in a statement, accompanying the report. “The process we have chosen to work on with Upstart could help guide the lending industry to set high standards when using new technology and data sources.”
Nat Hoopes, vice president and head of public policy and regulatory affairs at Upstart, said in a statement that he hopes the reports serve as “a guide that can help all lenders better understand the obligation to test transparently and to improve on the status quo by relentlessly optimizing models for fairness and inclusion, as well as accuracy.”
Gerron Levi, senior vice president and head of government affairs at the American Fintech Council, an industry advocacy group, said the reports and efforts like them could provide the public and regulators with more confidence in lenders’ use of this technology.
“They have ground breaking models,” Levi said of fintech companies using this new technology to make credit decisions. “But it’s also important that through third-party reviews, through the regulatory framework, that the public have confidence that they are producing fair outcomes.”
So far, there is some data indicating that Upstart’s model is doing better at providing financing to creditworthy, but often invisible, borrowers than traditional underwriting criteria. For example, an October analysis of data provided by Upstart found that borrowers with credit scores below 640 who had their loans approved by Upstart had a 60% probability of being rejected by traditional lenders.
(One of the authors listed on the paper is an Upstart employee. He set up the data environment for the research and didn’t participate in the analysis, according to Marco Di Maggio, Ogunlesi Family Professor of Finance at Harvard Business School, and another author of the study. Di Maggio added that the company didn’t have any say on the outcome of the research and that he and the third co-author have no financial ties to the company).
Upstart’s model was more likely to spot creditworthy borrowers the more traditional formula had missed, even if they had little credit history, in part thanks to data on their jobs — salaried applicants benefitted more than those doing hourly work — and their educational attainment, Di Maggio said.
It’s “terrific” that Upstart’s model is performing better than more traditional underwriting criteria that have been notorious for discriminating against certain protected groups both in credit availability and credit pricing, Bruckner said. “I’m super excited that that’s the case,” he said.
Still, questions remain. For example, it’s unclear from the report how Upstart’s model impacts people at the intersection of certain protected groups, for example, women of color, Rubenstein said. Previous research on the use of artificial intelligence and machine learning in facial recognition has found that those algorithms perform worse on people of color, especially women of color.
“If you only tested Black versus white and men versus women you wouldn’t have known,” Rubenstein said.
Opening that “pandora’s box” of intersectionality does create challenges in terms of deciding which categories are relevant to test and what might be relevant when it comes to the law, Rubenstein said. Still, that doesn’t mean that these questions shouldn’t be investigated, he said.
“It’s fair to say that the promise of using artificial intelligence and machine learning systems is to improve equity in lending,” he said. “It should also be the case that they do have the ability to test for these cross sections.”
In addition, just because one company’s model is performing well now doesn’t mean it will perform well in the future, Bruckner said.
“The big issue that I worry about is that models degrade,” he said. That’s particularly concerning, Bruckner said, because in 2017, the Consumer Financial Protection Bureau granted Upstart a No-Action Letter, essentially a document indicating the agency has no present intention to bring an enforcement action against a company over a particular product or service. The agency provided Upstart with a No-Action Letter again in 2020. As part of the No Action Letter program, Upstart agreed, among other things, to test its model for adverse impacts by group and provide the agency with the results.
“Will the model continue to perform well in the future? Will other companies’ models continue to perform as the Upstart model is performing today,” Bruckner said. “Why is a private nonprofit consumer watchdog the ones who are doing this? We have a federal consumer protection agency whose job it is to do this and they said we have no present intention to bring an enforcement action.”
It appears the Biden-era Consumer Financial Protection Bureau will be looking at this issue closely. A CFPB spokesperson wrote in an email that artificial intelligence and machine learning models that use non-traditional data in underwriting “are accountable for discriminatory lending outcomes.” The spokesperson added that the agency will “use all its tools” to prevent these models “from entrenching biases in underwriting systems.”
During a Congressional hearing last month, Rohit Chopra, the recently-confirmed director of the CFPB, emphasized the need to look carefully at the way lenders’ models use alternative data to make credit decisions. “There has been a myth that algorithms can be completely neutral,” Chopra said. “In reality, many of those algorithms reinforce the biases that already exist.”