Giving autonomous agents the ability to forecast their own outcomes and uncertainty will allow them to communicate their competencies and be used more safely. We accomplish this by using a learned world model of the agent system to forecast full agent trajectories over long time horizons. Real world systems involve significant sources of both aleatoric and epistemic uncertainty that compound and interact over time in the trajectory forecasts. We develop a deep generative world model that quantifies aleatoric uncertainty while incorporating the effects of epistemic uncertainty during the learning process. We show on two reinforcement learning problems that our uncertainty model produces calibrated outcome uncertainty estimates over the full trajectory horizon.
翻译:赋予自主代理人预测自身结果和不确定性的能力将使他们能够交流其能力并更安全地加以利用。我们通过使用一种先进的代理系统世界模型来预测长时间跨度的全部制剂轨迹来实现这一点。真正的世界系统涉及大量偏移和认知不确定性的来源,在轨迹预测中,这种不确定性会随着时间的推移而增加和相互作用。我们开发了一种深厚的基因化世界模型,这种模型既能量化偏移不确定性,又能结合学习过程中的显性不确定性的影响。我们展示了两个强化学习问题,即我们的不确定性模型产生了整个轨迹上经校准的结果不确定性估计数。