UMBRELLA: 不确定性-软件模型型离线强化学习利用规划 (UMBRELLA: Uncertainty-Aware Model-Based Offline Reinforcement Learning Leveraging Planning)

Offline reinforcement learning (RL) provides a framework for learning decision-making from offline data and therefore constitutes a promising approach for real-world applications as automated driving. Self-driving vehicles (SDV) learn a policy, which potentially even outperforms the behavior in the sub-optimal data set. Especially in safety-critical applications as automated driving, explainability and transferability are key to success. This motivates the use of model-based offline RL approaches, which leverage planning. However, current state-of-the-art methods often neglect the influence of aleatoric uncertainty arising from the stochastic behavior of multi-agent systems. This work proposes a novel approach for Uncertainty-aware Model-Based Offline REinforcement Learning Leveraging plAnning (UMBRELLA), which solves the prediction, planning, and control problem of the SDV jointly in an interpretable learning-based fashion. A trained action-conditioned stochastic dynamics model captures distinctively different future evolutions of the traffic scene. The analysis provides empirical evidence for the effectiveness of our approach in challenging automated driving simulations and based on a real-world public dataset.

翻译：离线强化学习(RL)为从离线数据中学习决策提供了一个框架,因此对作为自动驾驶的实际情况应用来说,这是一个很有希望的方法。自驾驶车辆(SDV)学习一种政策,它甚至有可能优于亚最佳数据集的行为。特别是在安全关键应用中,它作为自动驾驶、解释性和可转让性是成功的关键。这促使使用基于模型的离线RL方法,这些方法能够利用规划。然而,目前最先进的方法往往忽视了多试剂系统随机行为产生的疏漏不确定性的影响。这项工作提出了一种新颖的方法,用于不确定自觉的模型离线脱轨学习招生(UMUMBRELLA),该方法以可解释的学习方式解决了SDV的预测、规划和控制问题。经过训练的、经过训练的随机动态模型可以捕捉到交通场未来截然不同的演变过程。该分析为我们在挑战性自动化公共模拟和基于现实世界的数据方面的方法的有效性提供了实证证据。

相关内容

Automator

关注 0

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

132+阅读 · 2020年5月14日

【NeurIPS2019演讲】伯克利Pieter Abbeel，通过元强化学习实现更好的基于模型的RL(Better Model-based RL through Meta RL)

专知会员服务

33+阅读 · 2019年12月13日

【Open AI】利用过程生成对强化学习进行基准测试（Leveraging Procedural Generation to Benchmark Reinforcement Learning）

专知会员服务

10+阅读 · 2019年12月3日

【NeurIPS2019】图变换网络：Graph Transformer Network

专知会员服务

112+阅读 · 2019年11月25日