There is a transition happening on the internet that almost nobody is talking about in terms of its full economic consequences. AI-generated bot traffic has overtaken human web traffic. The web was built on a model that assumed humans were on the other end — eyeballs reading ads that paid for the content. That model is now broken, and nobody has a workable replacement.
According to The Register, AI crawlers and bots are scraping content across the web at scales that are straining server infrastructure and fundamentally undermining the economics of online publishing. This is not a future risk. It is happening right now, today, and the companies building the crawlers have not developed a compensation model that works for publishers.
The mechanics are straightforward. Companies building AI systems — whether training new models or running retrieval-augmented generation for AI assistants — need access to current web content. They deploy crawlers that hit pages constantly. A single AI company's crawlers can generate traffic equivalent to thousands of human readers. Unlike human readers, they do not see ads. They do not subscribe. They do not generate any revenue for the sites they scrape.
For a small publisher running on tight margins, the situation is perverse: your most popular and frequently-updated content gets hit hardest by crawlers, driving up your hosting and bandwidth costs, while generating zero revenue. You are subsidizing the AI companies' training data pipeline and getting nothing in return.
robots.txt is clearly not solving this. The protocol that allows site owners to signal which pages they do not want crawled is voluntary, non-enforceable, and largely ignored by aggressive crawlers. OpenAI and Google have faced allegations of violating robots.txt restrictions. Smaller crawlers often have no policy at all.
Larger publishers have started pursuing licensing deals. The New York Times sued OpenAI. Reddit struck a deal with Google for training data access. The Associated Press signed agreements with multiple AI companies. These deals matter because they establish that web content has economic value to AI companies and that value should flow back to creators.
But the licensing model only works for publishers with enough scale and legal resources to negotiate. The long tail of the web — independent bloggers, niche publications, local news sites — has no leverage and no realistic path to compensation.
