As shown by recent studies, machine intelligence-enabled systems are vulnerable to test cases resulting from either adversarial manipulation or natural distribution shifts. This has raised great concerns about deploying machine learning algorithms for real-world applications, especially in safety-critical domains such as autonomous driving (AD). On the other hand, traditional AD testing on naturalistic scenarios requires hundreds of millions of driving miles due to the high dimensionality and rareness of the safety-critical scenarios in the real world. As a result, several approaches for autonomous driving evaluation have been explored, which are usually, however, based on different simulation platforms, types of safety-critical scenarios, scenario generation algorithms, and driving route variations. Thus, despite a large amount of effort in autonomous driving testing, it is still challenging to compare and understand the effectiveness and efficiency of different testing scenario generation algorithms and testing mechanisms under similar conditions. In this paper, we aim to provide the first unified platform SafeBench to integrate different types of safety-critical testing scenarios, scenario generation algorithms, and other variations such as driving routes and environments. Meanwhile, we implement 4 deep reinforcement learning-based AD algorithms with 4 types of input (e.g., bird's-eye view, camera) to perform fair comparisons on SafeBench. We find our generated testing scenarios are indeed more challenging and observe the trade-off between the performance of AD agents under benign and safety-critical testing scenarios. We believe our unified platform SafeBench for large-scale and effective autonomous driving testing will motivate the development of new testing scenario generation and safe AD algorithms. SafeBench is available at https://safebench.github.io.
翻译:最近的研究显示,机智智能系统很容易测试因对抗性操纵或自然分配变化而产生的案件,这引起了人们对为现实世界应用,特别是自主驾驶(AD)等安全关键领域的应用部署机器学习算法的极大关切。另一方面,由于现实世界中安全危急情景的高度维度和罕见性,自然情景的传统自动测试需要数亿英里的驾驶力。因此,探索了几种自主驾驶评价方法,但通常基于不同的模拟平台、安全临界情景类型、情景生成算法和驱动路变。因此,尽管在自主驾驶测试方面做了大量努力,但比较和理解不同测试情景生成的效益和效率以及类似条件下的测试机制仍然是困难的。在本论文中,我们的目标是提供第一个统一平台,整合不同类别的安全危急测试情景、情景生成算法和其他变异,例如驱动路径和环境。同时,我们实施4种深度强化基于安全关键情景的自动算法,在4类安全驾驶测试中进行4个深度的测试,在安全风险测试中进行。我们正在测试,在安全度测试中进行。