利用适应性应激测试和后向算法,在高纤维模拟中发现故障 (Finding Failures in High-Fidelity Simulation using Adaptive Stress Testing and the Backward Algorithm)

Validating the safety of autonomous systems generally requires the use of high-fidelity simulators that adequately capture the variability of real-world scenarios. However, it is generally not feasible to exhaustively search the space of simulation scenarios for failures. Adaptive stress testing (AST) is a method that uses reinforcement learning to find the most likely failure of a system. AST with a deep reinforcement learning solver has been shown to be effective in finding failures across a range of different systems. This approach generally involves running many simulations, which can be very expensive when using a high-fidelity simulator. To improve efficiency, we present a method that first finds failures in a low-fidelity simulator. It then uses the backward algorithm, which trains a deep neural network policy using a single expert demonstration, to adapt the low-fidelity failures to high-fidelity. We have created a series of autonomous vehicle validation case studies that represent some of the ways low-fidelity and high-fidelity simulators can differ, such as time discretization. We demonstrate in a variety of case studies that this new AST approach is able to find failures with significantly fewer high-fidelity simulation steps than are needed when just running AST directly in high-fidelity. As a proof of concept, we also demonstrate AST on NVIDIA's DriveSim simulator, an industry state-of-the-art high-fidelity simulator for finding failures in autonomous vehicles.

翻译：验证自主系统的安全性通常需要使用能充分捕捉真实世界情景变异性的高纤维模拟模拟器。但是, 通常不可能彻底搜索模拟假设的失败空间。适应性压力测试( AST) 是一种方法, 使用强化学习来发现系统最可能的失败。使用深强化学习解答器的AST 显示在发现一系列不同系统失败方面是有效的。这种方法一般涉及进行许多模拟, 当使用高纤维模拟器时可能非常昂贵。为了提高效率, 我们提出了一个方法, 首先在低纤维模拟器中发现失败。然后, 它使用后向算法, 利用单一专家演示来训练深神经网络政策, 将低纤维性失灵调整为高纤维。我们创建了一系列自主的车辆验证案例研究, 代表了低纤维性和高纤维模拟器的某些方法, 比如时间分解。我们通过一系列的案例研究, 显示, 在低纤维- 高动力- 动力- 动力- 工具- 高级智能- 工具的模拟方法, 也能够直接发现比高智能- 高级智能- 高级智能- 的模拟- 动作- 动作- 动作- 工具- 方法, 当我们需要的高度的模拟- 高度的模拟- 动作- 发现时, 高度的模拟- 需要- 高度的动作- 高度- 发现- 高度- 高级- 动作- 动作- 动作- 高度- 发现- 高度- 高度- 高度- 高度- 动作- 动作- 动作- 动作- 动作- 动作- 方法可以直接测试- 发现- 方法可以发现高度- 高度- 性- 性- 性- 高度- 发现- 和高度- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性- 性-