For highly automated driving above SAE level~3, behavior generation algorithms must reliably consider the inherent uncertainties of the traffic environment, e.g. arising from the variety of human driving styles. Such uncertainties can generate ambiguous decisions, requiring the algorithm to appropriately balance low-probability hazardous events, e.g. collisions, and high-probability beneficial events, e.g. quickly crossing the intersection. State-of-the-art behavior generation algorithms lack a distributional treatment of decision outcome. This impedes a proper risk evaluation in ambiguous situations, often encouraging either unsafe or conservative behavior. Thus, we propose a two-step approach for risk-sensitive behavior generation combining offline distribution learning with online risk assessment. Specifically, we first learn an optimal policy in an uncertain environment with Deep Distributional Reinforcement Learning. During execution, the optimal risk-sensitive action is selected by applying established risk criteria, such as the Conditional Value at Risk, to the learned state-action return distributions. In intersection crossing scenarios, we evaluate different risk criteria and demonstrate that our approach increases safety, while maintaining an active driving style. Our approach shall encourage further studies about the benefits of risk-sensitive approaches for self-driving vehicles.
翻译:对于高度自动化的驾驶高于SAE ~ 3, 行为生成算法必须可靠地考虑交通环境固有的不确定性, 例如,由人类驾驶风格的多样性产生的不确定性。这种不确定性可以产生模糊的决定,要求算法适当平衡低概率危险事件(例如碰撞)和高概率有利事件(例如快速穿越交叉点),例如,快速跨过交叉点。州级行为生成算法缺乏对决策结果的分布式处理。这妨碍了在模糊情况下进行适当的风险评估,往往鼓励不安全或保守行为。因此,我们建议了一种两步法,将离线分发学习与在线风险评估相结合,以产生风险敏感性行为。具体地说,我们首先在不确定的环境中学习了一种最佳政策,即深度分配强化学习。在执行过程中,最佳风险敏感行动是通过应用既定的风险标准(例如风险有条件价值)来选择的州-行动回报分布。在交叉点中,我们评估不同的风险标准,并表明我们的方法提高了安全性,同时保持积极的驾驶风格。我们的方法应该鼓励进一步研究对风险敏感车辆自我驱动方式的好处。