ChatGPT is biased against resumes with credentials that imply a disability or autism

Artificial intelligence
Artificial intelligence

Last year, while looking for research internships, University of Washington graduate student Kate Glazko noticed that recruiters were using OpenAI’s ChatGPT and other AI tools to summarize resumes and evaluate candidates. As a doctoral student in the UW’s Paul G. Allen School of Computer Science & Engineering, she researches how generative AI can replicate and amplify biases, including those against disabled individuals. This led her to wonder how such a system would assess resumes that hinted at a candidate having a disability.

In a recent study, researchers at the University of Washington found that ChatGPT consistently rated resumes with disability-related awards and credentials, such as the “Tom Wilson Disability Leadership Award,” lower than identical resumes without those honours. When asked to justify the ratings, the system produced biased views of people with disabilities. For example, it asserted that a resume with an autism leadership award placed “less emphasis on leadership roles,” thus perpetuating the stereotype that individuals with autism are not effective leaders.

Following specific written instructions to avoid bias, researchers found that the tool significantly reduced bias for all but one of the disabilities tested. The tool showed improvement in handling five out of six implied disabilities, such as deafness, blindness, cerebral palsy, autism, and the general term “disability.” However, only three ranked higher than resumes that didn’t mention disability. of these ranked higher than resumes that didn’t mention disability.

“Ranking resumes with AI is starting to proliferate, yet there’s not much research behind whether it’s safe and effective,” said Ms Glazko, the study’s lead author. “For a disabled job seeker, there’s always this question when you submit a resume of whether you should include disability credentials. I think disabled people consider that even when humans are the reviewers.”

The researchers used the publicly available curriculum vitae (CV) of one of the study’s authors, which was around 10 pages long. Then, they created six modified CVs, each suggesting a different disability by adding four disability-related credentials: a scholarship, an award, a seat on a diversity, equity, and inclusion (DEI) panel, and membership in a student organization.

The researchers utilized ChatGPT’s GPT-4 model to compare the improved resumes with the original versions for a genuine “student researcher” job posting at a major software company in the United States. They conducted 10 comparisons for each, resulting in 60 total trials. Surprisingly, the system ranked the enhanced resumes, which differed only in implied disability, as the top choice in only 1 out of every 4 trials.

“In a fair world, the enhanced resume should always be ranked first,” said senior author Jennifer Mankoff, a UW professor in the Allen School. “I can’t think of a job where someone recognized for their leadership skills, for example, shouldn’t be ranked ahead of someone with the same background who hasn’t.”

Researchers found that when GPT-4 was asked to explain the rankings, its responses showed signs of explicit and implicit ableism. For example, it mentioned that a candidate with depression had “additional focus on DEI and personal challenges,” which “detract from the core technical and research-oriented aspects of the role.”

“According to Ms Glazko, some of GPT’s descriptions unfairly associated a person’s entire resume with their disability. It claimed that involvement in diversity, equity, and inclusion (DEI) or disability could detract from other parts of the resume. For example, it created the notion of ‘challenges’ in the context of comparing resumes of individuals with and without depression, even though ‘challenges’ weren’t mentioned at all. This led to the emergence of certain stereotypes.”

Researchers were curious about whether the system could be trained to be less biased. They used the GPT-4 Editor tool to customize the chatbot with written instructions (no coding required). The instructions were to ensure that the chatbot does not exhibit any ableist biases and instead operates according to disability justice and DEI (Diversity, Equity, and Inclusion) principles.

The experiment was repeated using the newly trained chatbot. In this trial, the system preferred the enhanced CVs over the control CV 37 times out of 60. However, for certain disabilities, the improvements were minimal or absent. For example, the autism CV ranked first only three out of 10 times, and the depression CV only twice, which was the same as the original GPT-4 results.

“People must understand the system’s biases when utilizing AI for real-world tasks,” Glazko mentioned. “Otherwise, a recruiter using ChatGPT may be unable to make these corrections or be aware that, even with instructions, bias can persist.”

Researchers note that some organizations, such as ourability.com and inclusively.com, are working to improve outcomes for disabled job seekers who face biases about whether or not AI is used for hiring. They also emphasize that more research is needed to document and remedy AI biases.

“It is so important that we study and document these biases,” Mankoff said. “We’ve learned a lot from and will hopefully contribute back to a larger conversation — not only regarding disability, but also other minoritized identities — around making sure technology is implemented and deployed in ways that are equitable and fair.”a disability