Publisher Penguin is taking legal action against OpenAI after ChatGPT was able to reproduce substantial portions of a German children's book series. The case adds to the growing pile of copyright lawsuits facing AI companies—and it raises a fundamental question about whether AI training constitutes fair use when the output can replace the original.
The book in question is from the popular German series Der kleine Drache Kokosnuss (The Little Dragon Coconut). When prompted, ChatGPT was able to generate significant passages from the books—not paraphrases or summaries, but verbatim or near-verbatim reproductions of copyrighted text.
If ChatGPT can regurgitate entire books, it's not "learning"—it's memorizing. That's the core of Penguin's legal argument. OpenAI has consistently claimed that training AI models on copyrighted material is transformative fair use, similar to how humans learn by reading. But humans don't typically reproduce copyrighted books word-for-word when asked.
Every major publisher is now watching this case. The New York Times, authors, and artists have all filed similar suits. But there's something particularly damning about children's books—they're relatively short, distinctive, and easy to verify when reproduced. If OpenAI can't demonstrate that its model is doing something transformative with this content, the fair use defense starts to fall apart.
The tech community's response has been predictably divided. AI proponents argue that all creative work builds on what came before, and preventing AI training on published works would stifle innovation. Critics counter that there's a difference between being inspired by prior work and copying it wholesale into a commercial product.
Having built a startup that dealt with user-generated content, I know firsthand how thorny copyright issues can get. But this case feels different. OpenAI isn't a platform hosting user uploads—it's a company that deliberately ingested copyrighted material, trained a model on it, and now sells access to a system that can reproduce that material on demand.
The financial stakes are enormous. If publishers succeed in these lawsuits, AI companies might be forced to pay licensing fees for training data—or worse, scrub their models and start over with only licensed or public domain content. That would fundamentally change the economics of large language models.




