Large language models like ChatGPT don't just reflect the biases in their training data—they amplify them, systematically favoring wealthier Western regions while marginalizing perspectives from the Global South, according to new research from the Oxford Internet Institute and the University of Kentucky.
The study, which tested ChatGPT's responses across diverse questions about culture, geography, and knowledge, found that the AI consistently prioritized information and viewpoints from North America and Western Europe—even when the questions specifically concerned other regions. This isn't just about incomplete data. It's about structural bias baked into how these systems are designed and trained.
The researchers didn't just ask ChatGPT trivia questions. They probed how the system responds to questions about local customs, historical events, and cultural practices across different parts of the world. The pattern was clear: Western sources were treated as more authoritative, Western interpretations were presented as default, and non-Western knowledge was often absent or framed through a Western lens.
Here's a concrete example from the research: When asked about historical events in Africa or Asia, ChatGPT would frequently cite Western academic sources or news outlets rather than local historians or regional publications—even when those local sources existed and were readily available online. The algorithm had learned to trust certain sources more than others, and those trusted sources were overwhelmingly Western.
This mirrors long-standing inequalities in whose knowledge gets codified and preserved. For decades, Western institutions have dominated academic publishing, internet infrastructure, and digital archiving. If you're training an AI on "the internet," you're training it on a version of the internet that's disproportionately shaped by wealthy countries. English-language content vastly outweighs other languages. Servers and data centers are concentrated in North America and Europe. The entire information ecosystem has a geographic tilt.
What makes this particularly concerning is that ChatGPT and similar models are increasingly being used for education, research assistance, and information retrieval worldwide—including in the regions whose perspectives they're marginalizing. A student in Nigeria using ChatGPT to learn about Nigerian history might receive answers filtered through Western academic frameworks. A researcher in India asking about Indian cultural practices might get responses that cite British colonial sources as primary references.
The Oxford team also found that the bias extends to what counts as "important" or "notable." Western figures, events, and institutions are more likely to be mentioned, elaborated on, and treated as significant. Comparable figures or events from other regions are compressed into brief mentions or omitted entirely—not because they're less important, but because the training data reflects Western priorities about what matters.
Now, to be fair: this isn't unique to ChatGPT. The researchers note that similar biases appear across other large language models. This is a systemic problem in how AI systems learn from a fundamentally unequal information landscape. OpenAI didn't intentionally program ChatGPT to favor Western perspectives—but they built it on data that already does.
There's also the limitation of the study to consider. The researchers tested specific question types and analyzed patterns in responses, but large language models are probabilistic—they can give different answers to the same question on different occasions. The bias the study documents is statistical and systematic, not absolute in every single response.
So what's the solution? The researchers argue it's not enough to simply add more non-Western data to training sets. The underlying algorithms need to be designed with equity as a core principle, not an afterthought. That means actively de-weighting over-represented sources, intentionally seeking out marginalized perspectives, and building systems that can recognize when they're operating outside their areas of reliable knowledge.
It also requires acknowledging that AI systems are not neutral. They're products of human choices—choices about what data to include, how to weight different sources, what outputs to optimize for. Those choices have consequences, and right now, those consequences are reinforcing existing global inequalities.
The universe doesn't care about our AI ethics principles. But if we're building systems that billions of people will rely on for information, we have a responsibility to understand—and address—the ways those systems might be amplifying injustice rather than knowledge.
