Blog — testing AI agents with synthetic users

July 2, 20267 min read

Synthetic Users: The Complete Guide (2026)

What synthetic users are, how they're generated, and how teams use them to test AI agents at population scale — beyond hand-written personas.

Synthetic users Agent testing

July 2, 20264 min read

Synthetic Respondents in Market Research: Uses and Limits

What synthetic respondents are, why survey research is turning to them, where they produce usable signal — and where they quietly mislead.

Synthetic users Audience research

July 2, 20264 min read

How to Interview a Synthetic User

A practical guide to interviewing synthetic users: grounding the person, asking questions that don't lead, probing in character, and reading the answers honestly.

Synthetic users Audience research

July 2, 20263 min read

Concept Testing with AI: Before You Build It, Ask the City

How AI concept testing works: put copy, pricing, or a feature idea in front of a synthetic audience and get cohort-split reactions before you commit a sprint.

Synthetic users Audience research

July 2, 20264 min read

AI Focus Groups: How Synthetic Panels Actually Work

What an AI focus group is, how a synthetic panel is built and moderated, what it's genuinely good for — and the limits that keep it honest.

Synthetic users Audience research

June 29, 20267 min read

How to Test AI Agents Before Production

How to test AI agents before production: a 7-step method — define success, build a realistic user population, run multi-turn tests, score, and gate.

Agent testing AI evals

June 27, 20266 min read

Stanford's Generative Agent Simulations of 1,000 People

What Stanford's generative agent simulations of 1,000 people actually showed — and what it means for testing AI agents against synthetic populations.

Synthetic users

June 25, 20266 min read

Why AI Agents Fail (and How to Catch It First)

Why do AI agents fail? A practical taxonomy of agent failure modes — capability, context, conversation, population — and how to catch each one first.

Reliability

June 22, 20268 min read

LLM-as-a-Judge: The Definitive Guide

How LLM as a judge works: judge designs, writing rubric prompts, the known biases (position, verbosity, self-preference) and how to mitigate each.

AI evals

June 18, 20267 min read

AI Agent Evaluation: Metrics, Methods, and Framework

A practical guide to AI agent evaluation: outcome vs. trajectory metrics, four evaluation methods, and a step-by-step framework for running agent evals.

AI evals Agent testing

June 15, 20266 min read

Synthetic Personas vs. Hand-Written Personas

Where hand-written personas break down, what synthetic personas and AI personas actually change, and an honest look at when each approach wins.

Synthetic users

June 11, 20267 min read

What Are AI Evals? A Plain-English Guide

AI evals explained: what an eval is (task + data + scoring), how LLM evals differ from agent evals, and how to write your first 20 eval cases.

AI evals

June 8, 20266 min read

Multi-Turn Evaluation: Testing the Whole Conversation

Why single-turn evals miss real failures, and how multi-turn evaluation works: scripted flows, simulated users, and conversation-level scoring.

AI evals Agent testing

June 4, 20266 min read

AI Agent Benchmarks Explained: τ-bench, GAIA, SWE-bench

What AI agent benchmarks actually measure — τ-bench, GAIA, SWE-bench — what scores tell you about your own agent, and how to build an internal benchmark.

AI evals

June 1, 20266 min read

User Simulation for AI Agents: How It Works

A technical guide to user simulation for AI agents: simulator anatomy, the conversation simulation loop, what to log and score, and the classic pitfalls.

Synthetic users Agent testing

May 28, 20266 min read

The AI Agent Reliability Gap: Demos Work, Launches Don't

AI agent reliability, explained: why demos succeed while agents in production fail, why the demo is a sampling statement, and how to close the gap first.

Reliability Agent testing

May 25, 20266 min read

Regression Testing Non-Deterministic AI Agents

LLM regression testing when the same input no longer gives the same output: pin seeds and populations, sample runs, score semantically, gate releases.

Agent testing Reliability

May 21, 20266 min read

Eval-Driven Development for AI Agents

Eval-driven development means writing evals before you build, iterating against them, and gating releases on results. How the loop works — and its limits.

AI evals Agent engineering

May 18, 20266 min read

Agent Memory: Architectures and How to Test Them

A practical guide to agent memory — context windows, summarization, long-term stores — the failure modes of each, and how to test memory before users do.

Agent engineering Reliability

May 15, 20266 min read

Synthetic User Testing for UX Research: Promise and Limits

An honest look at synthetic user testing and AI user research — where simulated users help, where they mislead, and the guardrails that keep you safe.

Synthetic users Audience research

May 14, 20266 min read

How to Test a LangGraph Agent

A layered method for testing a LangGraph agent: unit-test nodes, verify routing and state, then run simulated users against the compiled graph.

Agent engineering Agent testing

May 11, 20267 min read

AI Agent Architecture Patterns and Their Failure Modes

The five common AI agent architecture patterns — single-loop, plan-then-execute, router, multi-agent — and the distinct ways each one breaks.

Agent engineering Reliability

May 7, 20267 min read

Testing Customer Support AI Agents: A Playbook

A practical playbook for testing customer support AI: intent coverage, policy compliance, escalation judgment, tone under fire, and segment-level results.

Agent testing

May 6, 20266 min read

Voice Agent Testing: What's Different

What voice ai testing adds beyond text — latency, barge-in, ASR errors, TTS artifacts — and why most voice agent failures are still dialog-logic failures.

Agent testing

May 4, 20267 min read

You Built an AI Agent. Now Prove It Works.

Building an AI agent got easy. Deploying AI agents to production is where it breaks. What proving your agent works actually requires, step by step.

Agent engineering Agent testing

Notes on testing AI agents