Reinforcement learning (RL) can be used to create a decision-making agent for autonomous driving. However, previous approaches provide only black-box solutions, which do not offer information on how confident the agent is about its decisions. An estimate of both the aleatoric and epistemic uncertainty of the agent's decisions is fundamental for real-world applications of autonomous driving. Therefore, this paper introduces the Ensemble Quantile Networks (EQN) method, which combines distributional RL with an ensemble approach, to obtain a complete uncertainty estimate. The distribution over returns is estimated by learning its quantile function implicitly, which gives the aleatoric uncertainty, whereas an ensemble of agents is trained on bootstrapped data to provide a Bayesian estimation of the epistemic uncertainty. A criterion for classifying which decisions that have an unacceptable uncertainty is also introduced. The results show that the EQN method can balance risk and time efficiency in different occluded intersection scenarios, by considering the estimated aleatoric uncertainty. Furthermore, it is shown that the trained agent can use the epistemic uncertainty information to identify situations that the agent has not been trained for and thereby avoid making unfounded, potentially dangerous, decisions outside of the training distribution.
翻译:强化学习(RL)可用于创建自主驾驶的决策代理。然而,先前的方法仅提供黑箱解决方案,无法提供有关该代理对其决定的信心的信息。该代理决定的解析和缩略不确定性的估算对于自主驾驶的现实世界应用至关重要。因此,本文件介绍了将分布式RL与混合方法相结合的聚合量网络(EQN)方法,以获得完全的不确定性估计。对回报的分布是通过了解其量化功能来估算的,这给其偏移性带来不确定性,而对一组代理进行关于累加数据的培训,以提供巴伊斯人对隐含不确定性的估计。对哪些决定具有不可接受的不确定性的分类标准也引入了。结果显示,EQN方法通过考虑估计的悬浮不确定性,可以平衡不同隐蔽的交叉情景的风险和时间效率。此外,它表明,经过培训的代理可以利用经培训的微调数据,从而避免了在外部进行可能具有危险性的不确定性的决策。