Model-based Deep Reinforcement Learning (RL) assumes the availability of a model of an environment's underlying transition dynamics. This model can be used to predict future effects of an agent's possible actions. When no such model is available, it is possible to learn an approximation of the real environment, e.g. by using generative neural networks, sometimes also called World Models. As most real-world environments are stochastic in nature and the transition dynamics are oftentimes multimodal, it is important to use a modelling technique that is able to reflect this multimodal uncertainty. In order to safely deploy such learning systems in the real world, especially in an industrial context, it is paramount to consider these uncertainties. In this work, we analyze existing and propose new metrics for the detection and quantification of multimodal uncertainty in RL based World Models. The correct modelling & detection of uncertain future states lays the foundation for handling critical situations in a safe way, which is a prerequisite for deploying RL systems in real-world settings.
翻译:以模型为基础的深层强化学习(RL) 假设环境基本转型动态模型的可用性。该模型可用于预测代理人可能采取的行动的未来效果。当没有这种模型时,有可能了解真实环境的近似性,例如,利用基因神经网络,有时也称为世界模型。由于大多数现实世界环境的性质是随机的,过渡动态往往是多式联运,因此,必须使用能够反映这种多式联运不确定性的模型技术。为了在现实世界,特别是在工业环境中安全地部署这种学习系统,最重要的是考虑这些不确定性。在这项工作中,我们分析现有和提议新的指标,用以检测和量化基于RL的世界模型中的多式不确定性。对不确定的未来状态的正确建模和探测为安全处理危急情况奠定了基础,这是在现实世界环境中部署RL系统的先决条件。