Active inference has emerged as an alternative approach to control problems given its intuitive (probabilistic) formalism. However, despite its theoretical utility, computational implementations have largely been restricted to low-dimensional, deterministic settings. This paper highlights that this is a consequence of the inability to adequately model stochastic transition dynamics, particularly when an extensive policy (i.e., action trajectory) space must be evaluated during planning. Fortunately, recent advancements propose a modified planning algorithm for finite temporal horizons. We build upon this work to assess the utility of active inference for a stochastic control setting. For this, we simulate the classic windy grid-world task with additional complexities, namely: 1) environment stochasticity; 2) learning of transition dynamics; and 3) partial observability. Our results demonstrate the advantage of using active inference, compared to reinforcement learning, in both deterministic and stochastic settings.
翻译:鉴于其直觉(概率)形式主义,主动性推论已作为一种控制问题的替代方法出现。然而,尽管其理论实用性,计算性实施基本上局限于低维、确定性环境。本文件强调,这是无法充分模拟随机过渡动态的结果,特别是在规划期间必须评估广泛的政策(即行动轨迹)空间的情况下。幸运的是,最近的进展为有限的时空地平线提出了一个修改的规划算法。我们在此基础上开展工作,评估主动推论对随机控制设置的效用。为此,我们模拟典型的风力网格-世界任务,增加了复杂性,即:(1) 环境随机性;(2) 学习过渡动态;(3) 部分可观察性。我们的结果表明,在确定性和随机性环境中,与强化学习相比,使用主动推力的好处。