AI crawlers and bots have become a major source of web traffic, according to new data—and they're fundamentally altering the economics of the open web in ways nobody agreed to.
This isn't about search engine crawlers indexing content for users. Those bots have always existed, and there's a clear value exchange: publishers get traffic from search results, search engines get content to index. What's different now is training crawlers—bots that scrape content to build AI models worth billions of dollars, while offering publishers exactly nothing in return.
The numbers are significant. According to recent analyses, AI bot traffic from companies like OpenAI, Anthropic, Google, and others now represents a substantial percentage of overall web requests. Small publishers are reporting bandwidth costs spiking as these crawlers hammer their servers.
Every scrape costs real money. Bandwidth isn't free. Server compute isn't free. For small publishers and independent creators operating on thin margins, AI training traffic is a tax they never signed up for—subsidizing the development of models that may eventually compete with their content.
The power dynamics are stark. Large AI companies can afford to pay for computing infrastructure. Small publishers can't afford to pay for unlimited bandwidth to serve bots that provide no direct benefit. Some sites have started blocking known AI crawlers via robots.txt, but that's a game of whack-a-mole as new crawlers appear with different user agents.
And here's the kicker: blocking AI crawlers might hurt your search rankings. Google, which operates both a search engine and AI training operations, has separate crawlers for different purposes. Block the AI training bot but not the search bot? Good luck figuring out which is which, or whether they even respect the distinction.
Some publishers are taking a different approach, negotiating licensing deals with AI companies. The New York Times, The Associated Press, and others have struck agreements worth millions. But those deals are only available to publishers with enough leverage to demand them. Independent bloggers and small news sites don't have that option.
The situation creates perverse incentives. If your content is valuable enough to train billion-dollar models, but you're too small to negotiate compensation, you're just subsidizing your own potential obsolescence. AI companies build models on your content, then offer AI-generated summaries that keep users from visiting your site.
