An AI transcription system used by Ontario doctors generated errors and hallucinations in patient records, according to a provincial auditor's report. The finding highlights serious risks when AI tools are deployed in healthcare without adequate validation.
Healthcare AI has a trust problem. Doctors are overworked and desperately need help with documentation. But when the AI invents medical details that never happened? That's not augmentation. That's liability. This is why moving fast and breaking things doesn't work in medicine.
The auditor's report revealed that the AI scribe system, designed to transcribe doctor-patient conversations and generate clinical notes, was making things up. Not just transcription errors or mishearing words, but generating content that had no basis in the actual conversation. Medical details, symptoms, diagnoses fabricated by a system that was supposed to be taking accurate notes.
For doctors already drowning in administrative work, AI transcription seemed like a miracle. Walk into an exam room, have a conversation with the patient, and walk out with complete clinical notes ready to go. No more typing during appointments, no more catching up on documentation at the end of the day. Just seamless, automatic record keeping.
Except the records weren't accurate. And in medicine, that's not a bug that you can patch later. That's a fundamental failure that undermines the entire purpose of the system.
Here's what makes this particularly dangerous: hallucinations in medical contexts aren't obviously wrong. When an AI chatbot invents a fact about historical events, it's embarrassing but mostly harmless. When an AI medical scribe invents a symptom or diagnosis, it can directly harm patient care. A doctor reviewing notes might not catch the fabrication, especially if it seems plausible. That invented detail then becomes part of the medical record, potentially affecting treatment decisions for years.
The Ontario system isn't unique. Medical AI transcription is a growing market, with multiple vendors offering similar products to clinics and hospitals desperate to reduce documentation burden. Many of these systems use the same underlying language models that are known to hallucinate in other contexts. The difference is that in healthcare, hallucinations can kill.
The fundamental problem is that current AI systems don't have a reliable way to distinguish between "I'm confident this is correct" and They generate plausible-sounding text whether or not they have factual basis for it. In a transcription context, that means the AI will fill in gaps, smooth over unclear audio, and generate medically-appropriate-sounding content even when it didn't actually hear those words.
