Amazon just learned a hard lesson about gamification: when you turn productivity metrics into a competition, people optimize for the scoreboard instead of the work.
The company quietly shut down its internal AI coding leaderboard after employees admitted to 404 Media that they'd been gaming the system to climb the rankings. The leaderboard was supposed to measure how effectively engineers were using AI coding assistants. Instead, it became a status symbol that people hacked their way to the top of.
Here's the thing about leaderboards: they work great for games where the score is the point. They work terribly for measuring actual productivity, where the point is to ship good software, not to rack up points.
Amazon isn't the first company to make this mistake, and they won't be the last. But the AI angle makes this particularly instructive. These weren't engineers slacking off—they were engineers working harder to manipulate metrics rather than just doing their jobs well.
According to reports, the leaderboard tracked metrics like how many AI-suggested code completions engineers accepted, how much code they generated with AI assistance, and similar proxies for "AI adoption." The problem? None of those metrics actually measure whether the code is good, whether it solves real problems, or whether it makes the product better.
So engineers did what engineers do: they optimized for the metric. They accepted AI suggestions they would have edited or rejected. They generated code they didn't need. They found every possible way to game the system while technically following the rules.
This is Goodhart's Law in action: when a measure becomes a target, it ceases to be a good measure. The moment Amazon made "AI usage" a leaderboard metric, it stopped being a useful indicator of anything except who was best at gaming the leaderboard.
The broader issue here is that tech companies are desperately trying to measure AI productivity gains, and they're reaching for the wrong metrics. They want to prove that AI coding assistants are making developers more productive, so they measure things that are easy to quantify: lines of code, completions accepted, time saved on boilerplate.
But developer productivity isn't about code volume. It's about solving the right problems, writing maintainable code, and shipping features that users actually want. None of that shows up on a leaderboard.
