March 17, 2026

Less WEIRD than feared: What insight teams need to know about LLMs

Share this blog:

Large Language Models (LLMs) are rapidly becoming embedded in the workflows of insight and research teams. From survey design to synthetic respondents and automated analysis, the promise is clear: automate the heavy lifting so researchers can focus on interpretation and strategic thinking.

But one question continues to surface: are LLMs still ‘WEIRD’?

What is 'WEIRD'?

WEIRD is an acronym standing for Western, Educated, Industrialised, Rich, Democratic, and is used by researchers to describe the narrow, unrepresentative population often used in behavioral science studies.

When findings from those populations are applied globally, they often fail to capture the diversity of human values and behaviours. 

The concern is that AI systems trained primarily on Western internet data and aligned to western cultural values during post-training may inherit the same bias.  

Our recent study explored this question by testing whether modern LLMs can simulate attitudes across different cultural contexts – and what the results mean for research teams using AI today. 

Our study & methodology: Are LLMs culturally biased?

The study compared responses from real survey participants with responses generated by three different LLMs. 

The methodology was deliberately designed to mirror real research practice: 

  • Humans and LLMs completed the same survey 
  • The survey measured values, attitudes and moral dilemmas 
  • Three markets were included: UK, France and Malaysia 
  • Three LLMs were tested, built in different regions: the anglophone Claude, French Mistral, and Chinese Qwen 

The research also introduced safeguards to reduce known LLM biases for the LLM (synthetic) responses. For example: 

  • Response order was randomised to mitigate positional bias 
  • Text labels were used instead of numeric codes to avoid model shortcuts 
  • Sequential memory was introduced to simulate a consistent respondent profile 
  • Sophisticated prompt engineering was used to instruct the model to answer as a person from a specific country and other demographic fields 

The goal was simple: determine whether modern LLMs still default to a Western worldview. 

Portrait,Of,A,Girl,Holding,A,Lens

The good news: Cultural simulation has improved

One of the clearest findings is that LLMs are much less culturally ‘WEIRD’ than they were just a few years ago. 

When prompted to respond as a person from the UK or France, the models produced answers that closely matched real survey results. This suggests that, with appropriate prompting, LLMs can now approximate cultural values far better than earlier versions. 

Even more interestingly, when prompted to respond as Malaysian participants, the models did shift toward Malaysian attitudes rather than defaulting to Western norms. 

In other words, LLMs can now recognise and simulate cross-cultural differences more effectively than before. 

For insight teams experimenting with synthetic data or AI-assisted research, this represents real progress. 

But key biases remain

However, the results also highlight several important limitations. 

1. LLMs do not navigate ethical and moral questions well 

LLMs struggled when questions involved moral trade-offs or ethical nuance. 

For example, when asked about potentially taboo issues, such as prostitution, Malaysian respondents had a much wider range of opinions than the models predicted. The LLMs tended to flatten the distribution of answers, often defaulting to stricter or more absolutist positions. 

This means AI-generated respondents may underestimate disagreement or diversity of opinion within cultures. 

2. Brand and market knowledge can be wildly inaccurate

Another striking example involved brand awareness. 

A well-known American insurance company had around 5% real awareness in France, yet the model simulated 85% awareness. 

For researchers, this is a critical warning. LLMs may hallucinate familiarity with brands or markets, especially when they rely on general web knowledge rather than actual market penetration data. 

3. Model choice matters

Not all models performed equally. 

Interestingly, the strongest performer across markets was not Claude, but Qwen, and open source model developed in Asia. It produced the closest match to real responses in all three countries tested. 

This highlights an important lesson: performance is not simply about geographic origin. Different models may perform better depending on the task, prompting and domain. 

Practical implications for insight teams

For research professionals integrating AI into their workflows, several practical guidelines emerge. 

Treat AI as a tool, not a respondent replacement

Synthetic respondents can support ideation and hypothesis generation, but they should not replace real fieldwork, especially for culturally nuanced topics. 

Combine model selection with prompt design

Model choice and prompting strategy work together. Teams should test different models and prompts before relying on outputs. 

Be cautious with sensitive topics

AI moderators or digital twin approaches that rely heavily on foundation LLMs may struggle with subjects that involve ethical ambiguity, social norms or taboo issues. 

Keep humans in the loop

Local expertise remains essential. Cultural nuance, translation and survey framing all benefit from collaboration with researchers who understand the market. 

The bottom line

LLMs are improving quickly, and they are less culturally biased than they once were. With careful prompting and model selection, they can approximate cross-cultural attitudes surprisingly well. 

But they are far from perfect. 

For insight teams, the opportunity lies in using AI to accelerate research workflows – while recognising that human understanding of culture, context and nuance remains irreplaceable. 

We’re here to help you move forward with confidence. Get in touch to talk to us about STRAT7 Nucleus – our AI hub designed specifically for insight teams. 

Featured content