Training self-driving systems to be robust to the long-tail of driving scenarios is a critical problem. Model-based approaches leverage simulation to emulate a wide range of scenarios without putting users at risk in the real world. One promising path to faithful simulation is to train a forward model of the world to predict the future states of both the environment and the ego-vehicle given past states and a sequence of actions. In this paper, we argue that it is beneficial to model the state of the ego-vehicle, which often has simple, predictable and deterministic behavior, separately from the rest of the environment, which is much more complex and highly multimodal. We propose to model the ego-vehicle using a simple and differentiable kinematic model, while training a stochastic convolutional forward model on raster representations of the state to predict the behavior of the rest of the environment. We explore several configurations of such decoupled models, and evaluate their performance both with Model Predictive Control (MPC) and direct policy learning. We test our methods on the task of highway driving and demonstrate lower crash rates and better stability. The code is available at https://github.com/vladisai/pytorch-PPUU/tree/ICLR2022.
翻译:以模型为基础的方法利用模拟模拟,在现实世界中不使用户面临风险的情况下,模仿各种情景,不使用户处于危险之中。忠实模拟的一个有希望的途径是培训一种世界前方模型,以预测过去状态和一系列行动的环境和自我载体的未来状态。在本文中,我们认为,模拟自我载体的状况是有益的,因为自我载体往往与环境的其余部分分开,具有简单、可预见和决定性的行为,环境的复杂程度要高得多,而且多模式化程度更高。我们提议使用简单和不同的运动型模型模拟自我载体,同时培训关于状态的变异式变异式模型,以预测环境其余部分的行为。我们探讨这种分解模式的若干组合,并用模型预测控制(MPC)和直接政策学习来评价其性能。我们测试了我们关于高速公路驾驶和低坠毁率及更稳定性的任务的方法。该代码可在 https://gi22/streubast/Uplistal./comdis。