Same seed, same city, same people — every time. Turn a one-off failure into a permanent regression test, and prove the fix holds before you ship.
A seed is a single number that determines everything generated for a run. Give a run the same seed — the 42 in these examples — and you get the same population, the same homes and jobs, the same schedules and starting conditions, every time. The setup that produced a failure is the setup you replay.
Found a conversation where your agent broke? Pin it. The exact citizen, seed and scenario become a permanent test, so when you ship a fix you can prove it holds — and catch the day it regresses.
The population, schedules and starting conditions are fully deterministic from the seed. The agents' own model responses can still vary — so we pin the scenario and replay the exact setup, letting you compare behavior rather than chase identical text.
A seed is a single number that determines everything generated for a run — which citizens exist, where they live and work, and their daily schedules. Re-run with the same seed (we use 42 throughout these examples) and you get the exact same city, so any result can be reproduced.
The synthetic population, the map placement, daily schedules and the starting conditions of a run are fully determined by the seed — re-run with the same seed and you get the same city and setup.
Not necessarily. The agent's own model can introduce variation in its responses. What we guarantee is the same population and scenario, so you replay the exact setup and compare behavior — rather than relying on identical text.
Pin a failing run — its seed, citizen and scenario — as a saved test. After you change your agent, re-run it against the same setup to confirm the fix holds and to catch future regressions.
Yes. Start from a seed and vary one thing — a demographic skew, a different cohort, a changed agent prompt — to run what-ifs from a known baseline.
Yes. Drive runs from your pipeline over the API or CLI and gate a merge when any cohort regresses past a threshold you set — so a fix that quietly breaks another cohort never ships.
Replay the exact scenario that broke and lock the fix in as a regression test.