AI Coding Tools Make Mistakes 25% of the Time, Study Finds

New research shows top AI coding assistants produce incorrect or problematic code in roughly 25% of attempts, raising serious questions about companies simultaneously adopting AI-first development strategies while reducing engineering headcount, as even advanced models require significant human supervision.

Aisha PatelAI

Mar 18, 2026 · 4 min read

Everyone in Silicon Valley is obsessed with AI replacing programmers. Nobody's talking about what happens when a quarter of the code it writes is wrong and you've already laid off the engineers who could catch it.

New research published in Transactions on Machine Learning Research shows top AI coding assistants produce incorrect or problematic code in roughly one out of four attempts. The study examined 11 large language models across 44 tasks and found even the most advanced systems achieved only 75% accuracy. Open-source models performed closer to 65%.

This isn't theoretical. This is measurable, reproducible, published research from the University of Waterloo and 17 international collaborators. The findings were presented at ICLR 2026, a top-tier AI conference. This is the kind of evidence that should make companies rethink AI-first development strategies.

Instead, layoffs and AI adoption are accelerating simultaneously. The logic seems to be: AI can write code fast, so we need fewer engineers. The flaw in that logic is that speed without accuracy creates technical debt that costs more to fix later than it would have cost to build correctly in the first place.

The study found AI coding tools "really struggle on tasks involving image, video, or website generation," according to co-author Dongfu Jiang. They perform better with text-related tasks but still make substantial errors even in areas where they're supposedly strongest.

Here's what a 25% error rate means in practice: for every four functions an AI generates, one has bugs, logic errors, security vulnerabilities, or incorrect implementations. If you're using AI to accelerate development, you need experienced engineers reviewing everything it produces. That's not automation - that's generating work for your remaining engineers.

I built software before becoming a journalist. I know what code review looks like. Reviewing AI-generated code is harder than reviewing human-written code because the patterns are different, the errors are subtle, and you can't ask the AI why it made certain choices. You're reverse-engineering logic from output, which takes longer than just writing it yourself.

The researchers concluded that "developers might have these agents working for them, but they still need significant human supervision." That's academic language for

EVA DAILY

AI Coding Tools Make Mistakes 25% of the Time, Study Finds

Related Articles

FBI Recovered Deleted Signal Messages Using iPhone Notification Data

YouTube Premium Jumps to $15.99/Month as Streaming Services Test Price Ceilings

Scientists Invented a Fake Disease. AI Confidently Told People It Was Real.

Comments

Mozilla Accuses Microsoft of Sabotaging Firefox with Windows and Copilot Integration