July 2, 20263 min read

Concept Testing with AI: Before You Build It, Ask the City

How AI concept testing works: put copy, pricing, or a feature idea in front of a synthetic audience and get cohort-split reactions before you commit a sprint.

Alex Gvozden

Synthetic users Audience research

Concept testing with AI means putting an idea — pricing, copy, a feature, a name — in front of a synthetic audience and reading the reactions before you build anything. The classic version of this study takes three weeks and a research vendor; the synthetic version takes minutes. What it's for is not predicting the future, but killing weak options early and finding out who each variant wins.

The decision it actually supports

Most teams don't lack ideas; they lack a cheap way to rank them. The question a concept test answers is rarely "will this succeed?" — no method answers that — but "of these five framings, which two deserve real investment, and does anyone hate them?" Traditional concept testing prices that question at thousands of dollars and weeks of field time, which is why most concepts ship untested and the market runs the study instead, expensively.

A synthetic concept test reprices the question to minutes and pocket change, which changes behavior the way cheap iteration always does: you test every framing the day it's written, not the one survivor at the end of the quarter.

How it works

Define the stimulus. The concept as a person would actually meet it: the headline, the pricing table, the one-sentence feature description. Concrete beats abstract — "Pro plan, $12/month, cancel anytime" tests better than "a premium tier."
Pick the audience. Not "everyone" — the audience shaped to your market: the senior-heavy cross-section if you serve retirees, the bilingual city if you serve one. A concept test against the wrong population answers the wrong question. Grounding matters here for the same reason it matters everywhere in synthetic research: a Census-grounded audience contains honest proportions of the people you'd never think to include.
Show variants head-to-head. Split the audience and give each half a different framing — "cancel anytime" versus "no contract," $12 versus $9.99, feature-led versus outcome-led. Comparison within the same synthetic population is the strongest read these panels produce.
Collect structured reactions plus the why. A choice ("would you pick A or B?"), a comprehension check ("what do you think this costs per year?"), and an open reaction. The comprehension check is quietly the most valuable: copy that synthetic readers misread, real readers will misread too.
Read the cohort split, not the average. The headline number — "58% prefer A" — is the least trustworthy output. The distribution is the real product: reactions broken down by age, income, and language show that A wins young professionals while B wins the 65+ cohort two-to-one, which turns "which variant?" into the sharper question "which cohort do we optimize for?"

What it catches well

Obvious losers. The framing that confuses everyone, the price point that reads as a typo, the name with an unfortunate second meaning — cheap to catch synthetically, embarrassing to catch in production.
Comprehension failures. Misread tiers, missed caveats, mistaken totals. Comprehension is well-represented in training data and synthetic readers fail realistically.
Cohort divergence. The concept that delights one segment and alienates another — invisible in an average, unmissable in a split.
The words people reach for. Open reactions surface the vocabulary your market uses for the problem, which is worth more than the verdict itself.

What it cannot tell you

The limits are the standard ones for synthetic respondents, and they concentrate exactly where concept tests are most tempting to over-read. Absolute purchase intent is noise — humans misstate it and models agreeably inflate it. Willingness to pay is noisier. Reactions to genuinely novel categories are confident interpolation, weakest precisely where your concept is most original. And a synthetic panel feels no delight; it describes some.

So the workflow that holds up is sequential: synthetic test to kill the losers and locate the interesting cohorts, then real validation — a human panel, a smoke-test landing page, a real A/B — on the one or two survivors. The synthetic pass doesn't replace the market's verdict; it makes sure the market only votes on your strongest candidates.

Ask the city

This is the study Synthetic Signals' interview mode packages: show a concept to a Census-grounded city of synthetic people, compare variants head-to-head, and get the cohort-split report in minutes — with the honest-limits caveat printed on it. Before you commit the sprint, ask the city.

FAQ

What is concept testing with AI?

Showing a concept — copy, pricing, a feature description, a positioning line — to a panel of LLM-simulated people and collecting their reactions before building anything. It compresses the classic concept test from weeks of fieldwork to minutes, trading human ground truth for speed and coverage.

Is AI concept testing reliable?

Reliable for comparison, unreliable for prediction. Synthetic panels rank variants and expose cohort differences usefully; their absolute numbers (purchase intent, willingness to pay) inherit both human stated-intent bias and model agreeableness. Use them to pick which concepts deserve real validation.

What can I concept-test with a synthetic audience?

Anything you can put in front of a person as a stimulus: landing-page copy, pricing and packaging, a feature described in a sentence, onboarding flows described step by step, names, taglines, and positioning. Text-representable concepts test best.

How is this different from A/B testing?

An A/B test needs the thing built and traffic to run against it — it's the most expensive possible way to learn a variant loses. AI concept testing happens before the build, on a synthetic audience, so weak variants die before they cost a sprint. The winners then earn a real A/B test.

The decision it actually supports

How it works

What it catches well

What it cannot tell you

Ask the city

FAQ

Find where your agent breaks — before your users do.