以示范加强学习为基础的深国家空间模型的不确定性 (On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning)

Improved state space models, such as Recurrent State Space Models (RSSMs), are a key factor behind recent advances in model-based reinforcement learning (RL). Yet, despite their empirical success, many of the underlying design choices are not well understood. We show that RSSMs use a suboptimal inference scheme and that models trained using this inference overestimate the aleatoric uncertainty of the ground truth system. We find this overestimation implicitly regularizes RSSMs and allows them to succeed in model-based RL. We postulate that this implicit regularization fulfills the same functionality as explicitly modeling epistemic uncertainty, which is crucial for many other model-based RL approaches. Yet, overestimating aleatoric uncertainty can also impair performance in cases where accurately estimating it matters, e.g., when we have to deal with occlusions, missing observations, or fusing sensor modalities at different frequencies. Moreover, the implicit regularization is a side-effect of the inference scheme and not the result of a rigorous, principled formulation, which renders analyzing or improving RSSMs difficult. Thus, we propose an alternative approach building on well-understood components for modeling aleatoric and epistemic uncertainty, dubbed Variational Recurrent Kalman Network (VRKN). This approach uses Kalman updates for exact smoothing inference in a latent space and Monte Carlo Dropout to model epistemic uncertainty. Due to the Kalman updates, the VRKN can naturally handle missing observations or sensor fusion problems with varying numbers of observations per time step. Our experiments show that using the VRKN instead of the RSSM improves performance in tasks where appropriately capturing aleatoric uncertainty is crucial while matching it in the deterministic standard benchmarks.

翻译：常规国家空间模型(RSSMs)等改进状态空间模型(RSSMs)是基于模型的强化学习(RL)最近取得进展的一个关键因素。然而,尽管它们取得了经验上的成功,许多基础设计选择却并没有得到很好的理解。我们表明,RSSMs使用亚最佳推断法,而使用这种推断法培训的模型高估了地面真相系统的异常不确定性。我们发现,这种高估意味着RSSMs(RSSMs)对基于模型的RL(RSSMs)进行规范化管理,并允许它们成功建立基于模型的RL(RSS)系统。我们假设,这种隐含的正规化与明确模拟的不确定性(对于许多其他基于模型的RLL)方法至关重要。然而,高估透度不确定性的不确定性也会损害在准确估算其重要性的情况下,例如,当我们不得不处理封闭性、缺失的观测或在不同频率使用感测模式的传感器模式时, 隐含式的模型和不断更新的结果,使得SRSMN的精确度观测变得困难。因此,我们提议在常规网络上采用一种替代的方法,从而在正常的变现,在Kal RFRFRBRWRWSMSMSMS值上进行一个正确的计算。