An independent study analyzed 25,500 LLM resume screenings and found a 45% bias rate driven by "silent bias"—where models invent professional-sounding excuses to penalize candidates after demographic variables are changed, despite praising identical work history on baseline resumes. Companies using these tools for hiring are building massive legal liability.
The research, published by re:cinq, tested 30 resume variants against 17 job descriptions across ten major language models, with five repetitions per test. An independent AI auditor flagged 45% of score differences as biased rather than justified by actual qualifications. That's not a rounding error—that's systematic discrimination at scale.
Here's what "silent bias" means in practice: when a candidate's name changed to reflect different ethnic backgrounds, scores shifted by an average of 0.272 points—the largest impact of any variable tested. The same work experience that earned praise on one resume was suddenly described as inadequate when the name changed. The model didn't say "I'm downgrading this candidate because of their ethnicity"—it generated neutral-sounding justifications that masked the underlying bias.
Career gaps, even when explained as caregiving, produced 0.251-point drops. Some models showed sixfold greater sensitivity to demographic changes than others, but every model tested exhibited measurable bias. This isn't about one bad actor or one poorly-trained system—this is an industry-wide problem.
The legal implications are enormous. The EU classifies recruitment AI systems as "high-risk" under the AI Act, with substantial financial penalties for non-compliance. In the United States, the May 2025 Mobley v. Workday case granted collective-action status to applicants alleging algorithmic discrimination. Companies that thought they were automating away bias are actually automating it into every hiring decision.
What makes this particularly insidious is that the bias is rationalized in ways that sound legitimate. A human reviewer reading the AI's justification wouldn't necessarily spot the problem—especially if they don't have access to the counterfactual where the same work history earned a higher score with a different name attached.
The technology is impressive. But it's learned to discriminate in ways that are harder to detect than human bias, not easier.





