Healthcare AI companies love to talk about efficiency gains. Here's what happens when you deploy an AI that can't tell the difference between transcribing what was said and inventing plausible-sounding medical information.
Ontario's auditor general has revealed that AI transcription systems deployed to help doctors document patient encounters generated fabricated medical information - classic AI "hallucinations" - including inventing therapy referrals and blood test orders that were never actually prescribed.
According to Shelley Spence, Ontario's auditor general, evaluators tested 20 different AI transcription programs intended for use by doctors. The results should alarm anyone who thinks AI is ready for high-stakes medical applications.
Nine systems exhibited hallucinations, making up information that was never discussed during the patient encounter. Twelve captured incorrect medications - not transcription errors, but fundamentally wrong drugs. Seventeen systems missed critical details about patients' mental health conditions. And four systems were approved for use despite vendors failing to submit basic security assessments.
Let that last part sink in. Systems that demonstrably fabricate medical information and miss critical patient details were approved for deployment in a healthcare system serving millions of people.
The AI transcription pitch is compelling. Doctors spend huge amounts of time on documentation. AI scribes can listen to patient encounters and generate notes automatically, freeing up physicians to focus on care rather than paperwork. In theory, everyone wins.
In practice, you get systems that hallucinate blood tests and miss mental health red flags.
The fundamental problem is that current AI systems don't actually understand what they're processing. They're predicting plausible text based on patterns in training data. Most of the time, that produces reasonable output. But "most of the time" isn't good enough in healthcare, where an incorrect medication or missed symptom can kill someone.
The auditor's report notes that "inaccuracies in medical notes generated by AI scribe systems could potentially result in inadequate or harmful treatment plans that may potentially impact patient health outcomes." That's bureaucratic language for: this could kill people.
Ontario has responded by issuing guidelines requiring physicians to manually review AI-generated notes for accuracy. Which, you know, defeats much of the efficiency argument. If doctors have to carefully review everything the AI writes anyway, you've just added an extra step rather than reducing documentation burden.
This is exactly the kind of AI deployment that gives the technology a bad name. Take a system that works reasonably well in low-stakes environments. Slap it into a high-stakes healthcare context without adequate validation. Discover only after implementation that it fabricates critical information. Issue guidelines saying humans should check everything. Declare victory on innovation.
The technology is impressive in the right context. Deploying it in healthcare without rigorous validation is negligent. And the fact that systems with known accuracy problems were approved for use suggests the procurement and oversight processes are fundamentally broken.
I want AI to improve healthcare. I genuinely do. But improvement requires systems that can be trusted to accurately capture medical information without hallucinating test orders or missing mental health crises. Until we have that, AI scribes are creating liability risks disguised as efficiency gains.
The capabilities are real. The readiness for medical deployment is questionable. And patients shouldn't be beta testers for AI systems that can't reliably distinguish between what actually happened and what sounds plausible.
