Autonomous driving and its widespread adoption have long held tremendous promise. Nevertheless, without a trustworthy and thorough testing procedure, not only does the industry struggle to mass-produce autonomous vehicles (AV), but neither the general public nor policymakers are convinced to accept the innovations. Generating safety-critical scenarios that present significant challenges to AV is an essential first step in testing. Real-world datasets include naturalistic but overly safe driving behaviors, whereas simulation would allow for unrestricted exploration of diverse and aggressive traffic scenarios. Conversely, higher-dimensional searching space in simulation disables efficient scenario generation without real-world data distribution as implicit constraints. In order to marry the benefits of both, it seems appealing to learn to generate scenarios from both offline real-world and online simulation data simultaneously. Therefore, we tailor a Reversely Regularized Hybrid Offline-and-Online ((Re)$^2$H2O) Reinforcement Learning recipe to additionally penalize Q-values on real-world data and reward Q-values on simulated data, which ensures the generated scenarios are both varied and adversarial. Through extensive experiments, our solution proves to produce more risky scenarios than competitive baselines and it can generalize to work with various autonomous driving models. In addition, these generated scenarios are also corroborated to be capable of fine-tuning AV performance.
翻译:然而,如果没有一个可信和彻底的测试程序,产业界不仅努力争取大规模生产自主车辆(AV),而且公众和决策者都不相信接受这些创新。产生对AV构成重大挑战的安全临界情景是测试的第一个必要步骤。现实世界数据集包括自然但过于安全的驱动行为,而模拟则允许不受限制地探索多种多样且具有攻击性的交通情况。相反,模拟中的高维搜索空间阻碍高效的情景生成,而没有作为隐含限制的真实世界数据传播。为了将两者的好处结合起来,它似乎呼吁同时学习从离线现实世界和在线模拟数据中生成情景。因此,我们设计了一个反常规化的离线和在线混合混合(Re)2$2H2O) 强化学习配方,以进一步惩罚真实世界数据上的Q价值,奖励模拟数据中的Q值,这确保生成的情景既多样化又具有对抗性。通过广泛的实验,我们的解决办法证明能够同时从离线上和在线模拟数据中产生情景。因此,比竞争性基准生成的动态模型更具有风险性。</s>