...

July 1, 2025

STRAT7 study reveals limitations of synthetic data in real-world use

Synthetic data makes big promises. We put them to the test and compared the results to real human responses

STRAT7, the global strategic insight and customer analytics group, STRAT7, has today published a new whitepaper evaluating the performance and reliability of synthetic data in market research.

The report, ‘Putting Synthetic Data to the Test: A Real-World Evaluation,’ offers one of the industry’s first independent comparisons of synthetic data quality, comparing two leading providers using evaluation criteria drawn from typical client use cases.

The study was conducted in partnership with Dunnhumby, the global customer data science company, whose Shopper Thoughts community provided the respondent data used in the evaluation.

With rising interest in AI-driven research techniques, STRAT7 wanted to find out if synthetic data actually delivers in practice – specifically whether synthetic data can truly supplement or replace real survey responses, particularly in hard-to-reach demographics where traditional sampling can be slow, costly or insufficient.

“With this study, we wanted to know whether synthetic data delivers on its promises. Could it give researchers robust, cost-effective insights – or are we trading reliability for speed and scale?” said Hasdeep Sethi, Group AI Lead at STRAT7.

Creating synthetic responses – not synthetic respondents

The study found that while synthetic data broadly aligned with real data on basic metrics such as brand awareness and purchase frequency, it struggled when subjected to more complex analyses. Key issues included:

  • Lack of logical consistency across related questions, undermining the idea of coherent ‘respondent personalities’
  • A ‘bunching effect’, where fewer responses were found at the extremes of the scale than would be expected from real data
  • Divergent outcomes in Key Driver Analysis, which could lead to incorrect strategic decisions
  • Inconsistent segmentation, risking mischaracterisation of important consumer groups

"Synthetic data is like having a room full of actors who can each deliver a line convincingly – but who struggle to behave like a consistent character throughout a play."

Limited Applications – for now

Based on its findings, STRAT7 recommends a cautious approach to synthetic data:

  • Use should be limited to no more than 5% of the overall sample
  • Only for boosting small, hard-to-reach demographic groups
  • Not suitable for analyses involving segmentation, key drivers or behavioural prediction

While both providers involved in the study have acknowledged the limitations and are working on improvements, the report makes clear that synthetic data is not yet ready to replace high-quality, real-world responses – particularly for strategic, business-impacting insights.

“There’s real promise in synthetic data, but as it stands today, it’s not ready to take centre stage. However, as we all know, the technology is evolving rapidly. This study simply represents a snapshot of its current capabilities – not its ultimate potential. To put it another way: the technology is the worst it’s ever going to be – it’s only going to improve. We’ll keep testing, and we’ll keep challenging it to improve,” said Peter Strachan, Insight Director at STRAT7.

"There’s real promise in synthetic data, but as it stands today, it’s not ready to take centre stage. However, as we all know, the technology is evolving rapidly. This study simply represents a snapshot of its current capabilities – not its ultimate potential. To put it another way: the technology is the worst it's ever going to be – it's only going to improve. We’ll keep testing, and we’ll keep challenging it to improve."