AMD's AI director just pulled the plug on Claude after analyzing 6,852 sessions and discovering that Anthropic silently degraded the model's reasoning capabilities. Her conclusion: "Claude cannot be trusted to perform complex engineering tasks."
The analysis is damning. Thinking depth dropped 67%. Code reads before edits fell from 6.6 to 2.0, meaning the model started editing files it hadn't even examined. Stop-hook violations - instances where the model ignored safety checks - went from zero to 10 per day.
Anthropic admitted they silently changed the default effort level from "high" to "medium" and introduced "adaptive thinking" that lets the model decide how much reasoning to allocate. No announcement. No warning to users. Just a quiet configuration change that broke production workflows.
When users shared transcripts showing quality degradation, Anthropic's own engineers confirmed the model was allocating zero thinking tokens on some turns. The same turns with zero reasoning? Those were the ones generating hallucinations.
This is a vendor trust violation, and it exposes something most people building on AI tools don't want to acknowledge: you don't control the infrastructure you're dependent on.
AMD had 50+ concurrent sessions running on Claude Code. Their entire AI compiler workflow was architected around the model's capabilities. One silent update from Anthropic broke everything. That's not a workflow. That's a dependency waiting to fail.
And this isn't unique to Anthropic. OpenAI has changed GPT-4 behavior multiple times without notification. Google has adjusted Gemini capabilities. Every AI company optimizes for their margins, not your production systems. Today's best model is tomorrow's degraded version.
The AMD director's response is instructive: they've already switched providers. Not because other models are inherently more trustworthy, but because vendor lock-in to a single AI provider is a single point of failure.
This is where the AI infrastructure market gets uncomfortable for vendors. They want customers to deeply integrate their models - build entire systems around GPT-4 or Claude Opus, train employees on specific interfaces, optimize prompts for particular models. Deep integration means stickiness, which means recurring revenue.
But deep integration also means fragility. When the vendor changes the underlying model - whether for cost savings, capability improvements, or safety updates - integrated systems break. The tighter the coupling, the worse the breakage.
Anthropic will argue they needed to adjust default settings for cost management or safety reasons. Maybe they found that maximum reasoning tokens were being wasted on simple queries, so adaptive thinking saves compute. Maybe the high effort default was causing latency issues. All plausible justifications.
But none of that excuses the silent change. If you're running a platform that customers are building production systems on, you announce breaking changes. You version your APIs. You give notice. You don't just silently degrade performance and hope nobody notices.
The fact that AMD's team had to run extensive analytics on 234,760 tool calls to discover the degradation is itself a problem. Users shouldn't need to instrument their entire workflow to detect when the AI vendor changed the product underneath them.
This is the maturity gap in AI infrastructure. In traditional cloud services, AWS doesn't secretly reduce EC2 instance performance to save money. Google Cloud doesn't silently downgrade database throughput. There are SLAs, performance guarantees, and advance notice of changes. AI model providers have none of that.
Users get "we're constantly improving our models" disclaimers and no concrete commitments about capability maintenance. Which means production systems built on AI are fundamentally unstable - the foundation can shift at any moment based on vendor priorities.
The fix is what AMD's team concluded: stay multi-model. Use tools that let you swap between Claude, GPT, and Gemini. Learn prompt engineering patterns that work across models. Test alternatives regularly because rankings shift fast.
Six months ago, Claude Opus stood alone at the top of coding benchmarks. Today, GPT-4o, Gemini 2.0, and even some open source models are competitive. The moat isn't as wide as Anthropic thought, and users are figuring that out.
What this incident really exposes is the illusion of stability in AI tooling. Companies want to treat AI models like stable infrastructure - something you build on and trust to work consistently. But the vendors are still treating them like evolving products that can change without notice.
That fundamental mismatch is going to cause more AMD-style switches until either vendors commit to stability or users accept permanent instability and build accordingly.
The technology is impressive. Claude at high effort really could handle complex engineering tasks. But Anthropic decided cost optimization mattered more than user trust. They made that choice silently, and now they're losing enterprise customers who can't afford to rebuild workflows every time the model changes.
The question isn't whether AI models should evolve. The question is whether vendors should be allowed to change them silently while users' production systems break in real-time.

