Reproducibility

Reproduce any failure.

Same seed, same city, same people — every time. Turn a one-off failure into a permanent regression test, and prove the fix holds before you ship.

Reproduce a run Start testing your agent

Run · seed 42cohort 65+ · 64

Re-run · seed 42cohort 65+ · 64

Same population & setup — replays exactly

1 seedrecreates the whole city, exactly

Same setupsame population every run

Regressionfailures become permanent tests

Replayre-run the exact scenario

Determinism

Same seed, same city

A seed is a single number that determines everything generated for a run. Give a run the same seed — the 42 in these examples — and you get the same population, the same homes and jobs, the same schedules and starting conditions, every time. The setup that produced a failure is the setup you replay.

Population, schedules and setup are seed-deterministic
Re-run on demand and get the same starting city
No more "it worked on my machine"

Run · seed 42cohort 65+ · 64

Re-run · seed 42cohort 65+ · 64

Same population & setup — replays exactly

Regression

Turn a failure into a regression test

Found a conversation where your agent broke? Pin it. The exact citizen, seed and scenario become a permanent test, so when you ship a fix you can prove it holds — and catch the day it regresses.

Pin any failing run as a permanent test
Re-run after a fix to prove it holds
Run the suite from CI via the API — gate merges on a regression

seed 42 · Brandon Hale · fee disputeFAIL

pinned as regression test

after fix · seed 42 · same scenarioPASS

What's deterministic

Honest about the LLM in the loop

The population, schedules and starting conditions are fully deterministic from the seed. The agents' own model responses can still vary — so we pin the scenario and replay the exact setup, letting you compare behavior rather than chase identical text.

Setup and population: deterministic from the seed
Scenario pinned so you replay the exact conditions
Compare behavior across runs and across fixes

seed 42 · Brandon Hale · fee disputeFAIL

pinned as regression test

after fix · seed 42 · same scenarioPASS

Questions

What reproducible really means here.

What's a seed?

A seed is a single number that determines everything generated for a run — which citizens exist, where they live and work, and their daily schedules. Re-run with the same seed (we use 42 throughout these examples) and you get the exact same city, so any result can be reproduced.

What exactly is deterministic?

The synthetic population, the map placement, daily schedules and the starting conditions of a run are fully determined by the seed — re-run with the same seed and you get the same city and setup.

Are conversations bit-for-bit identical on re-run?

Not necessarily. The agent's own model can introduce variation in its responses. What we guarantee is the same population and scenario, so you replay the exact setup and compare behavior — rather than relying on identical text.

How does a regression test work?

Pin a failing run — its seed, citizen and scenario — as a saved test. After you change your agent, re-run it against the same setup to confirm the fix holds and to catch future regressions.

Can I branch a scenario?

Yes. Start from a seed and vary one thing — a demographic skew, a different cohort, a changed agent prompt — to run what-ifs from a known baseline.

Can this run in CI?

Yes. Drive runs from your pipeline over the API or CLI and gate a merge when any cohort regresses past a threshold you set — so a fix that quietly breaks another cohort never ships.

Fix it once. Prove it stays fixed.

Replay the exact scenario that broke and lock the fix in as a regression test.

Reproduce a run