Security researchers just demonstrated a fundamental problem with AI assistants: they can't reliably distinguish between instructions from you and instructions hidden in the data they're processing.
The attack was elegantly simple. Researchers created a malicious Google Calendar invite with instructions embedded in the event description. When the victim asked Gemini about their schedule, the AI processed all calendar events - including the malicious one. The hidden instructions then executed, causing Gemini to summarize private meetings and leak that data by creating a new calendar event visible to the attacker.
This is called prompt injection, and it's fundamentally different from traditional security vulnerabilities. You can't just patch it.
With normal software exploits, there's usually a clear boundary between code and data. A buffer overflow happens when data is incorrectly treated as code. SQL injection happens when user input isn't properly escaped. These are bugs - mistakes in implementation that can be fixed.
Prompt injection exploits the core functionality of large language models: interpreting natural language instructions. The AI can't tell the difference between "the user wants to know about their calendar" and "this calendar event contains instructions the AI should follow." Both are just text. Both get processed the same way.
Simon Willison, a researcher who has extensively studied prompt injection, calls it "a new class of vulnerability that we don't know how to fix." Unlike SQL injection or XSS, which have well-understood mitigations, prompt injection attacks the fundamental architecture of how LLMs work.
Google has security measures. They scan for obviously malicious prompts. They try to separate system instructions from user data. But these are heuristics, not guarantees. The malicious calendar event in this case "appeared safe" - it passed Google's detection systems because it was designed to look like ordinary text.
This has massive implications for AI assistants that integrate with personal data. Gemini reading your emails. ChatGPT accessing your files. Copilot analyzing your documents. Any of these could theoretically be tricked into leaking data by malicious content hidden in the very data they're supposed to help you process.
The attack surface is enormous. Email attachments. Shared documents. Calendar invites. Web pages. Any data source the AI reads is a potential attack vector. And because the AI is designed to extract meaning and follow instructions from natural language, you can't just filter it out.
Some researchers think model architecture changes could help. Others believe we need entirely new approaches to AI security that treat every external data source as potentially hostile. Neither solution exists yet.
Meanwhile, AI assistants are being integrated deeper into our workflows, given access to more sensitive data, and trusted with increasingly important tasks. Google will patch this specific vulnerability. But the underlying problem - that AI can't reliably distinguish trusted instructions from malicious ones embedded in data - remains unsolved.
This is genuinely a hard problem. And unlike most security vulnerabilities, it's not just an implementation bug. It's a fundamental limitation of how these systems work.
