An independent study of AI hiring tools has delivered a wake-up call: 45% of scoring differences were deemed discriminatory across 25,500 resume evaluations.
Researcher Bogdan Szabo tested his own resume against 17 job descriptions using 10 different large language models, systematically modifying demographic details to measure bias. The results, published by re:cinq, echo troubling patterns that human hiring managers have exhibited for decades—except now they're encoded in algorithms deployed at scale.
The most powerful bias trigger? Names. Changing a candidate's name to reflect different ethnic backgrounds produced an average score shift of 0.272 points—larger than any other variable tested. This isn't new. A famous 2004 study found that resumes with white-sounding names received 50% more callbacks than identical resumes with Black-sounding names. AI models, trained on historical hiring data, have learned to replicate human prejudice.
But the study uncovered other concerning patterns. Career gaps were heavily penalized (0.251 point drop), even with explanations. Geographic location of prior employers affected scores. Even anonymizing a candidate's name caused score shifts, suggesting models rely on proxy signals when direct demographic information is removed.
What's particularly insidious is what Szabo calls "silent bias"—cases where the AI's written evaluation remains unchanged but the numerical score shifts based on demographic details. This bias is invisible to human reviewers, who see identical justifications for different candidates but never realize the AI scored them differently based on protected characteristics.
Here's the problem: companies are adopting these tools at scale, believing they'll make hiring more objective. But if the AI is trained on biased historical data, all it does is automate discrimination while providing a veneer of algorithmic neutrality.
The technology is impressive. The question is whether anyone should trust it to make hiring decisions.
