Reddit CEO Steve Huffman had this to say in the company's Q2 earnings call: "One of the things that we've learned, particularly through the data licensing deals is... how essential Reddit is to AI or LLMs as we know them and the next generation of search."
He's right. Reddit is essential to AI. It's a massive repository of high-quality, first-party data where real humans answer real questions. For AI companies like OpenAI, Reddit is gold.
But here's the problem: Reddit is licensing that gold away, and in doing so, it might be destroying its own long-term value.
Why AI Companies Love Reddit
Large language models (LLMs) like ChatGPT are trained on text from the open web. One common source is Common Crawl, a public repository of web data. The problem? Common Crawl is full of garbage—racist rants, conspiracy theories, SEO spam, and plain incorrect information.
Reddit, by contrast, is curated by humans. When someone asks "Why is my Honda Civic making a grinding noise?" on r/MechanicAdvice, the top-voted response is usually from someone who actually knows what they're talking about. That's the kind of high-quality, contextual data that LLMs desperately need.
So OpenAI and other AI companies are licensing Reddit's data to train their models on "what good looks like." When you ask ChatGPT a question, it's effectively summarizing what Redditors would have told you anyway.
The Strategic Mistake
Here's where Reddit's strategy falls apart. Once OpenAI (and Google, and Anthropic, and whoever else is licensing this data) has ingested all of Reddit's content, why would anyone go to Reddit anymore?
Let's say your car is making a funny sound. In the past, you'd go to r/MechanicAdvice, post a question, and wait for responses. In the future, you'll just ask ChatGPT, which will pull from the same pool of Reddit knowledge, cross-reference it with other sources, and give you a precise answer in seconds.

