Recent work has uncovered close links between between classical reinforcement learning algorithms, Bayesian filtering, and Active Inference which lets us understand value functions in terms of Bayesian posteriors. An alternative, but less explored, model-free RL algorithm is the successor representation, which expresses the value function in terms of a successor matrix of expected future state occupancies. In this paper, we derive the probabilistic interpretation of the successor representation in terms of Bayesian filtering and thus design a novel active inference agent architecture utilizing successor representations instead of model-based planning. We demonstrate that active inference successor representations have significant advantages over current active inference agents in terms of planning horizon and computational cost. Moreover, we demonstrate how the successor representation agent can generalize to changing reward functions such as variants of the expected free energy.
翻译:最近的工作揭示了古典强化学习算法、贝叶斯过滤法和积极推论之间的密切联系,这些推论使我们能够理解巴伊西亚后人的价值功能。一个替代但较少探索的无模型的RL算法是继任代表制,它代表了未来预期国家占有物的继承矩阵中的价值功能。在本文中,我们从巴伊西亚过滤法的角度得出了对继承代表制的概率解释,从而设计了一个新型的积极推论代理结构,利用后继代表制而不是以模型为基础的规划。我们表明,积极推论后代表制在规划地平面和计算成本方面比当前主动推论者有很大的优势。此外,我们展示了继任代表制如何能够笼统地改变奖励功能,比如预期自由能源的变体。