Discovering hazardous scenarios is crucial in testing and further improving driving policies. However, conducting efficient driving policy testing faces two key challenges. On the one hand, the probability of naturally encountering hazardous scenarios is low when testing a well-trained autonomous driving strategy. Thus, discovering these scenarios by purely real-world road testing is extremely costly. On the other hand, a proper determination of accident responsibility is necessary for this task. Collecting scenarios with wrong-attributed responsibilities will lead to an overly conservative autonomous driving strategy. To be more specific, we aim to discover hazardous scenarios that are autonomous-vehicle responsible (AV-responsible), i.e., the vulnerabilities of the under-test driving policy. To this end, this work proposes a Safety Test framework by finding Av-Responsible Scenarios (STARS) based on multi-agent reinforcement learning. STARS guides other traffic participants to produce Av-Responsible Scenarios and make the under-test driving policy misbehave via introducing Hazard Arbitration Reward (HAR). HAR enables our framework to discover diverse, complex, and AV-responsible hazardous scenarios. Experimental results against four different driving policies in three environments demonstrate that STARS can effectively discover AV-responsible hazardous scenarios. These scenarios indeed correspond to the vulnerabilities of the under-test driving policies, thus are meaningful for their further improvements.
翻译:发现危险情景对于测试和进一步改进驾驶政策至关重要。然而,高效驾驶政策测试面临两大挑战。一方面,在测试训练有素的自主驾驶战略时,自然遇到危险情景的概率较低。因此,通过纯粹真实世界道路测试发现这些情景的成本极高。另一方面,适当确定事故责任对于这项任务是必要的。收集错误归属责任的情景将会导致过度保守自主驾驶战略。更具体地说,我们的目标是发现自主驾驶(AV负责的)危险情景,即低测试驾驶政策的脆弱性。为此,这项工作提出一个安全测试框架,在多试剂强化学习的基础上找到可预见情景(STARS)。STARS指导其他交通参与者制定适应性情景,并通过引入危害仲裁回报(HAR),使低测试驱动政策错误得到接受。HAR使我们的框架能够发现多样性、复杂和低测试驱动政策的弱点。为此,StarS在三种危险情景下有效展示了风险风险情景。StarS的实验结果可以用来在四种风险情景下有效展示。