...

WEIRD in, WEIRD out: How AI tools are making the world look weird

Ross Denton

Research Director

STRAT7 Jigsaw

The images arrive already complete; there is no communication between them and myself, no reciprocal exchange. As much as we like to say that the world is opening up to us, since we can see every part of it, we can also say that the world is closing itself off – in all its openness.

Which humans?

In academia and the media, AI is often described as mirroring human psychology with humanlike reasoning, human-level performance, human-like communication. In these comparisons, “humans” are treated as the benchmark.

In a provocative 2023 paper, researchers at Harvard University asked – which humans?

The diversity of human psychologies has been a hot topic since 2010, when researchers found that many accepted psychological “truths” were often confined to so-called “WEIRD people”: Western, Educated, Industrialised, Rich, Democratic. What feel like universal beliefs for people like me and no doubt many of the readers of this blog, e.g. that I am an automonous individual, are instead only true for a thin slice of humanity.

So when we say AI tools are “human-like”, what we mean is that AI is WEIRD.

In fact, this paper found that more than that, it thinks American. The greater the cultural distance between a country and the USA, the less accurate ChatGPT got at simulating peoples’ values. For countries like Libya and Pakistan, AI results are little better than a coin toss.

Picture1

This paper showed this through an ingenious method – administering the World Values Survey (WVS) 1,000 times to ChatGPT and then comparing it to real data from other countries. The WVS measures everything from self-reported cultural values to moral principles, attitudes towards family, religion, poverty and so on.[1]
In this simple chart, they plotted two variables:

  1. How accurately ChatGPTs’ responses to the survey correlated to those of people in each country.
  2. The “cultural distance” between that country and the USA, sourced from Muthukrishna’s 2020 analysis, also based on the WVS.[2]

The eagle-eyed reader may note that ChatGPT responses are slightly more correlated to smaller Western countries such as New Zealand than the US. This likely reflects the USA’s greater cultural diversity, and the fact that ChatGPT was developed in California.

What this means for researchers

Marketers and researchers in non-WEIRD countries have often struggled for budget and bandwidth. AI tools’ poorer accuracy in their markets therefore introduce a double jeopardy: the non-WEIRD countries least likely to secure research budgets also have the worst accuracy from “off the shelf” AI tools.

These biases could show up throughout the research process, for example:

  • Project design: Methodologies or brainstorm cues that do not fit how local people think
  • Recruitment: Synthetic respondents that “think” totally differently to local people
  • Moderation: AI moderators that ask inappropriate questions, or do not adequately explore important contextual or social factors in peoples’ decision making
  • Analysis: Context-rich interactions and holistic thinking being ignored or under-valued in AI-powered analysis

There is a real risk that increasing use of AI tools in international research will flatten out and devalue insights, as highly diverse peoples’ individual spoken and unspoken responses are fed into text-processing machines and emerge looking and sounding vaguely Californian. What looks like a living, breathing forest to us may end up being processed as just so much wood.

This doesn’t mean we should ignore AI tools when working cross-culturally. They’re simply too useful. Instead, I’d suggest that we need to invest in the cultural fitness of our thinking and processes. Just as you don’t have to lose physical stamina when you start driving a car, your projects do not have to atrophy cultural meaning when you introduce elements of automation.

As researchers we can deepen the cultural layer in our international work by:

  • Carrying on using context-rich methodologies: We should keep advocating for in-person qualitative fieldwork where possible, where moderators can understand respondents’ environment, relationships and body language. We should also ensure that quantitative survey questions capture social relationships and their role in someone’s behaviours.
  • Working closer than ever with our local qual partners: If you are going to be using AI more in analysis and are not doing the fieldwork yourself, protect against flat insights by involving your local partners more in study/question design, check your starting hypotheses with them, and set more time to discuss the findings.
  • Investing in staff training: AI tools work best when the user knows what they need to look for. Researchers need to feel empowered to design studies that account for local culture, and to be able to spot and reflect on cross-cultural differences during analysis.

And as AI product owners or users, it’s more a question of how we minimise the loss of cultural meaning. I remain to be convinced that AI moderation will achieve that for as long as the “moderator” is powered by an American LLM with deep-coded cultural biases. When using AI for analysis or to design research, however, there is likely to be marginal value in:

  • Using context-first prompting: Before asking research-specific questions, consider prompting your tools to give a summary of a country / the country and how it relates to the research topic / a countries’ cultural values (i.e. Hofstede scores). Some LinkedIn commenters on my last post said that this improves LLM responses. Some others suggested cultural role-playing, “i.e. imagine you are a Colombian researcher…” but I’d be concerned that this could encourage LLMs to rely on shallower stereotypes
  • Probing the psychology of our AI tools: Whether you primarily use a proprietary, open source, or branded LLM, try out some of the questions from the WVS or from recent studies you’ve run to get a closer feel for your tool’s biases.

Final thoughts

LLMs process information in a WEIRD way, are psychologically WEIRD, and assume the average human is too. At the same time, the LLM space is getting ever more concentrated with US companies dominating. There is a real risk that as researchers we could get pulled into ways of working and thinking that make the world feel smaller, and much less wondrous. We must build our cultural fitness to ensure that we can introduce automation without losing sight of the many ways of being human.
Here at STRAT7 (and for me personally), we want to explore further:

  • Are LLMs getting more or less WEIRD?
  • Do non-American LLMs (e.g. DeepSeek, Mistral, Apertus) perform better or worse here? Do they have their own cultural biases in-built?
  • Whether context-first prompting works for culture, and if so, what best practice looks like
  • Overall, what does this mean for an insight professional under pressure to deliver results faster and cheaper?

We’re going to conduct a few experiments of our own with LLMs from different continents – comment below if you have any thoughts on what to look out for.

[1] Separately, I recommend checking out the WVS website – it is old school but has some great data and insights.

[2] The researchers released an interactive tool to compare cultural distance across a range of dimensions, worth a play around

Featured content