Everyone thinks they're anonymous online. They're not. New research demonstrates that large language models can identify pseudonymous users across platforms by analyzing writing style patterns at scale - and the technique works even when users deliberately try to mask their identity.
Ars Technica reports on the study, which represents a fundamental threat to the concept of online anonymity. This isn't some theoretical academic exercise. It's scalable, automated, and works right now with existing models.
Stylometry - analyzing writing patterns to identify authors - has existed for decades. What's new is that LLMs have made it trivial. What previously required linguistic experts and careful manual analysis can now be done in milliseconds across millions of users. The models pick up on subtle patterns: word choice, sentence structure, punctuation habits, even the rhythm of your prose.
The researchers tested both commercial models like GPT-4 and open-source alternatives. Accuracy rates varied, but even the worst performers could identify users across platforms with better-than-chance accuracy. The best models achieved identification rates that should terrify anyone who's ever posted under a pseudonym.
Here's what makes this especially concerning: the countermeasures don't really work. Using different writing styles, deliberately varying your vocabulary, even running your text through paraphrasing tools - the models can still often identify you. Your writing style is like a fingerprint, and AI can read it.
Think about the implications. Whistleblowers who post anonymously about corporate wrongdoing. Political dissidents operating under pseudonyms. Journalists protecting sources. Anyone who thought a throwaway Reddit account or anonymous blog protected their identity needs to reconsider.
This technology doesn't even require access to the platforms themselves. Someone could scrape public posts, run them through an LLM, and build a database of de-anonymized users. Law enforcement will use it. Authoritarian governments will use it. Private investigators and stalkers will use it. The capability exists, and there's no putting it back in the bottle.
The researchers suggest some defenses: collaboratively written text is harder to attribute, and very short posts provide less signal. But those aren't practical solutions for most use cases. You can't write every anonymous post via committee.
