解决内在不确定性:利用分配强化学习方法自动驾驶的风险敏感行为生成 (Addressing Inherent Uncertainty: Risk-Sensitive Behavior Generation for Automated Driving using Distributional Reinforcement Learning)

For highly automated driving above SAE level~3, behavior generation algorithms must reliably consider the inherent uncertainties of the traffic environment, e.g. arising from the variety of human driving styles. Such uncertainties can generate ambiguous decisions, requiring the algorithm to appropriately balance low-probability hazardous events, e.g. collisions, and high-probability beneficial events, e.g. quickly crossing the intersection. State-of-the-art behavior generation algorithms lack a distributional treatment of decision outcome. This impedes a proper risk evaluation in ambiguous situations, often encouraging either unsafe or conservative behavior. Thus, we propose a two-step approach for risk-sensitive behavior generation combining offline distribution learning with online risk assessment. Specifically, we first learn an optimal policy in an uncertain environment with Deep Distributional Reinforcement Learning. During execution, the optimal risk-sensitive action is selected by applying established risk criteria, such as the Conditional Value at Risk, to the learned state-action return distributions. In intersection crossing scenarios, we evaluate different risk criteria and demonstrate that our approach increases safety, while maintaining an active driving style. Our approach shall encourage further studies about the benefits of risk-sensitive approaches for self-driving vehicles.

翻译：对于高度自动化的驾驶高于SAE ~ 3, 行为生成算法必须可靠地考虑交通环境固有的不确定性, 例如,由人类驾驶风格的多样性产生的不确定性。这种不确定性可以产生模糊的决定,要求算法适当平衡低概率危险事件(例如碰撞)和高概率有利事件(例如快速穿越交叉点),例如,快速跨过交叉点。州级行为生成算法缺乏对决策结果的分布式处理。这妨碍了在模糊情况下进行适当的风险评估,往往鼓励不安全或保守行为。因此,我们建议了一种两步法,将离线分发学习与在线风险评估相结合,以产生风险敏感性行为。具体地说,我们首先在不确定的环境中学习了一种最佳政策,即深度分配强化学习。在执行过程中,最佳风险敏感行动是通过应用既定的风险标准(例如风险有条件价值)来选择的州-行动回报分布。在交叉点中,我们评估不同的风险标准,并表明我们的方法提高了安全性,同时保持积极的驾驶风格。我们的方法应该鼓励进一步研究对风险敏感车辆自我驱动方式的好处。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

迁移学习简明教程，11页ppt

专知会员服务

108+阅读 · 2020年8月4日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

111+阅读 · 2020年5月15日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【牛津大学】深度学习时间序列预测，Time Series Forecasting With Deep Learning: A Survey

专知会员服务

142+阅读 · 2020年4月30日