Offline reinforcement learning (RL) provides a framework for learning decision-making from offline data and therefore constitutes a promising approach for real-world applications as automated driving. Self-driving vehicles (SDV) learn a policy, which potentially even outperforms the behavior in the sub-optimal data set. Especially in safety-critical applications as automated driving, explainability and transferability are key to success. This motivates the use of model-based offline RL approaches, which leverage planning. However, current state-of-the-art methods often neglect the influence of aleatoric uncertainty arising from the stochastic behavior of multi-agent systems. This work proposes a novel approach for Uncertainty-aware Model-Based Offline REinforcement Learning Leveraging plAnning (UMBRELLA), which solves the prediction, planning, and control problem of the SDV jointly in an interpretable learning-based fashion. A trained action-conditioned stochastic dynamics model captures distinctively different future evolutions of the traffic scene. The analysis provides empirical evidence for the effectiveness of our approach in challenging automated driving simulations and based on a real-world public dataset.
翻译:离线强化学习(RL)为从离线数据中学习决策提供了一个框架,因此对作为自动驾驶的实际情况应用来说,这是一个很有希望的方法。自驾驶车辆(SDV)学习一种政策,它甚至有可能优于亚最佳数据集的行为。特别是在安全关键应用中,它作为自动驾驶、解释性和可转让性是成功的关键。这促使使用基于模型的离线RL方法,这些方法能够利用规划。然而,目前最先进的方法往往忽视了多试剂系统随机行为产生的疏漏不确定性的影响。这项工作提出了一种新颖的方法,用于不确定自觉的模型离线脱轨学习招生(UMUMBRELLA),该方法以可解释的学习方式解决了SDV的预测、规划和控制问题。经过训练的、经过训练的随机动态模型可以捕捉到交通场未来截然不同的演变过程。该分析为我们在挑战性自动化公共模拟和基于现实世界的数据方面的方法的有效性提供了实证证据。