A researcher analyzed 7.3TB of GitHub data covering 66,000 projects and found something remarkable: large software projects follow the same growth patterns they did decades ago – despite LLMs, better tooling, and massively faster hardware. Lehman's Laws of Software Evolution, formulated in the 1970s, still hold. The fundamental physics of software development might be more resilient than anyone in Silicon Valley wants to admit.
The study, published in a peer-reviewed journal, tracked projects with more than 700 commits over multiple years. These mature systems – about 16% of the dataset – show remarkably stable growth trajectories. Tools change. Languages evolve. Hardware gets exponentially faster. Development methodologies shift. And yet the rate at which large codebases grow and change remains stubbornly consistent.
This is fascinating counter-evidence to the "AI will 10x developer productivity" narrative that's been dominating tech discourse. If GPT-4 and GitHub Copilot were fundamentally changing how software gets built, we should see that in the growth patterns of projects using these tools. We don't. Large projects still grow at roughly the same rate, hit the same complexity walls, and require the same amount of maintenance effort.
The researcher's Reddit post (41 upvotes, 5 comments) includes an important caveat: the data captures AI adoption through early 2025, so we're still in the initial wave. Fair point. But Lehman's Laws have held through decades of supposedly revolutionary changes – structured programming, object-oriented design, agile methodologies, cloud infrastructure. The fact that they're still holding through the LLM wave suggests these laws reflect something fundamental about complex systems, not just limitations of previous tools.
What's particularly interesting is the duality in the findings. Large, mature projects (16% of the dataset) follow extremely stable growth curves. The remaining 84% – smaller projects, prototypes, "homework" repositories – show much more erratic patterns and higher rates of abandonment. This numerical dominance of smaller, less mature projects creates a concerning feedback loop.
When you train AI models on GitHub data, you're overwhelmingly training on code from immature projects that never reached long-term sustainability. The patterns that emerge from that training – the code that AI tools suggest – may reflect popularity rather than quality. We might be teaching AI to write code the way beginners write code, simply because beginners' code vastly outnumbers expert code in the training data.
