Anthropic exposed the complete source code of Claude Code—512,000 lines including internal prompts, unreleased features, and operational data—through an improperly packaged source map file. The company issued DMCA takedowns after the code spread to 8,100+ GitHub repositories.
This leak reveals exactly how production AI coding tools actually work under the hood—the prompt engineering, the tool orchestration, the feature flags for unreleased capabilities. It's a rare window into what these companies are building that they don't want competitors to see.
The technical failure is almost embarrassingly simple: a source map file was included in the npm package distribution. Source maps are meant for debugging—they contain the original source code to help developers trace minified production code back to readable source. But they're not supposed to ship to end users. That's literally what .npmignore is for.
What makes this particularly interesting is that it's the second time this has happened. A similar leak occurred in February 2025. Once is a mistake. Twice suggests a process problem. When you're shipping a tool that helps other people write code, having your own deployment pipeline leak your source code is... not a great look.
But the content of the leak is more interesting than the mechanism. The code reveals 44 unreleased feature flags, including KAIROS (an always-on background agent that monitors and acts proactively), ULTRAPLAN (extended 30-minute deep thinking sessions for complex planning), and BUDDY (a companion pet system that appears to be testing session persistence and personality modeling).
The competitive moat in AI coding tools isn't the model—it's the harness around the model. How do you orchestrate multiple tools? How do you manage permissions? How do you prevent runaway behavior? How do you maintain context across sessions? The Claude Code source code is essentially a reference implementation for all of these questions.
One developer reviewing the code noted that the frustration detection system uses a regex rather than an LLM inference call—likely for cost and latency reasons. Internal comments revealed that approximately 1,279 sessions per day experienced 50+ consecutive failures, burning roughly 250,000 wasted API calls daily. Building reliable agentic systems at scale is genuinely hard, and the numbers prove it.





