July 2, 20264 min read

Synthetic Respondents in Market Research: Uses and Limits

What synthetic respondents are, why survey research is turning to them, where they produce usable signal — and where they quietly mislead.

Nik Kowalsi

Synthetic users Audience research

Synthetic respondents are LLM-simulated survey participants: generated people with defined demographics and context who answer questions in character, at scale. They exist because traditional survey research is in a supply crisis — single-digit response rates, contaminated panels, weeks of field time — and because grounded simulation has become good enough to produce directional signal. The craft is knowing which questions they can carry.

The problem they answer

Quantitative research quietly broke over the last decade. Response rates for online surveys routinely sit below 10%, which makes every "representative" sample a heroic weighting exercise. Panel quality has degraded — professional survey-takers, straight-liners, and increasingly bots — to the point where a meaningful share of "human" data isn't. Field time runs two to six weeks, which is longer than most product decisions can wait. And the segments you most need — non-English speakers, low-income households, night-shift workers — are exactly the ones panels can't reliably deliver.

Synthetic respondents attack the supply problem directly: the panel is generated, not recruited. Composition is whatever the underlying population data says it should be, answers arrive in minutes, and nobody is speeding through your survey for a $2 incentive.

What the research actually says

The evidence base is young but no longer anecdotal. The most-cited data point is Stanford's generative-agent simulation of 1,000 real people: agents grounded in rich individual-level interview data reproduced their human counterparts' General Social Survey answers at 85% of the humans' own test-retest consistency — and, notably, grounded agents showed smaller accuracy gaps across demographic groups than stereotype-prompted ones. The broader "silicon sampling" literature is messier: ungrounded LLM respondents flatten variance, skew agreeable, and drift toward the model's training distribution rather than the target population's.

The pattern across studies is consistent and useful: grounding is the variable that matters. Simulated respondents anchored in real per-person data or real population statistics behave meaningfully more like humans than personas invented in a prompt. That's the difference between a synthetic panel and a very elaborate way of asking one model the same question a thousand times.

Where synthetic respondents earn their keep

Instrument pretesting. Run the questionnaire past a synthetic panel before fielding it. Ambiguous wording, broken skip logic, primed orderings, and options nobody picks all surface without burning a single real respondent. This is the least controversial use — the object under test is your survey, not human truth.
Directional ranking. "Which of these five messages lands best, and with whom?" is a question synthetic panels handle well, because the output is an ordering compared within the same simulated population, not an absolute number.
Cohort exploration. Because a grounded synthetic population contains honest proportions of every segment, you can read results split by age, income, and language and find the cohort that hates variant B — then decide whether that cohort matters enough to verify with humans.
Longitudinal reads. Synthetic panels don't churn. Ask the same thousand respondents the same question after every messaging change and you get a consistent yardstick — something human trackers spend fortunes approximating.
Segments you can't field. A rehearsal of the hard-to-recruit segment beats ignoring it. Rehearsal, not oracle.

Where they mislead

Point estimates. "34% would pay $12/month" is not a number to put in a board deck. Synthetic willingness-to-pay compounds two unreliabilities — humans misreport intent, and models add agreeable optimism on top.
Variance and extremes. Synthetic distributions run flatter and more centrist than human ones. The furious one-star detractor and the evangelist are both under-represented, which matters if your business lives on the tails.
Novel categories. Where there's no human behavioral record to learn from, a synthetic answer is confident interpolation. The newer the concept, the weaker the signal.
Anything you'd bet the company on. A synthetic panel is directional signal, not a substitute for a properly powered human study — a caveat that belongs inside the report, not in the fine print.

The workflow that holds up

The teams using synthetic respondents defensibly all converge on the same shape: synthetic first, humans on the narrowed question. Pretest the instrument synthetically. Rank the options synthetically and kill the weak ones. Locate the cohorts where reactions split. Then spend your real fieldwork budget on the two variants and three segments that matter, with a sharper questionnaire than you started with. The synthetic pass doesn't replace the human study — it makes the human study smaller, faster, and harder to fool.

That's the model behind Synthetic Signals' interview mode: surveys and concept tests over a Census-grounded synthetic audience, cohort-split reports, and honest-limits caveats baked into the output — a research panel that answers in minutes and knows what it is.

FAQ

What is a synthetic respondent?

A synthetic respondent is an LLM-simulated survey participant — a generated person with defined demographics, context, and personality who answers research questions in character. Panels of them stand in for recruited respondents when speed, cost, or segment access make human fieldwork impractical.

Are synthetic respondents valid for market research?

For directional work, increasingly yes: pretesting instruments, ranking options, and exploring cohort differences. For point estimates — market size, willingness to pay, purchase intent percentages — no. Synthetic distributions are flatter and more agreeable than human ones, and should be validated before any bet.

Why are researchers considering synthetic respondents at all?

Survey research has a supply crisis: response rates below 10%, professional survey-takers and bots contaminating panels, and 2–6 week field times. Synthetic respondents trade some fidelity for instant, uncontaminated, infinitely patient answers from any segment you can define.

Can synthetic respondents replace my survey panel?

No. The defensible workflow is sequential: synthetic first to kill weak options, sharpen the instrument, and locate interesting cohorts — then a properly powered human study on the narrowed question. Synthetic panels compress the expensive part; they don't replace it.

The problem they answer

What the research actually says

Where synthetic respondents earn their keep

Where they mislead

The workflow that holds up

FAQ

Find where your agent breaks — before your users do.