Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing system limits, but all applications share one key requirement: reliability. To enable systematic experimentation, a simulation agent must behave as intended. It should minimize actions that may lead to undesired outcomes, such as collisions, which can distort the signal-to-noise ratio in analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents nearly solve the full training set within a day. They generalize effectively to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in those cases. We open-source the pre-trained agents and integrate them with a batched multi-agent simulator. Demonstrations of agent behaviors can be found at https://sites.google.com/view/reliable-sim-agents.
翻译:暂无翻译