Configure multi-trial batch simulations, define statistical expectations, and inspect aggregate outcome reports across seeds and matchups.
Arena runs the same matchup N times across different seeds to produce statistical outcome distributions. It is the primary tool for balance testing — does Archetype A beat Archetype B at a reasonable rate? Does plate armour reduce lethality by the expected margin?
npm run run:validation for built-in calibration scenarios — this is the closest existing batch runner.createSession({worldSeed: i, ...}) over a seed range, collect winner from runSession events.knightWins / total > 0.55.test/ to lock the distribution and catch balance regressions in CI.