通过积极的推断胶囊,以微薄回报进行在线强化学习 (Online reinforcement learning with sparse rewards through an active inference capsule)

Intelligent agents must pursue their goals in complex environments with partial information and often limited computational capacity. Reinforcement learning methods have achieved great success by creating agents that optimize engineered reward functions, but which often struggle to learn in sparse-reward environments, generally require many environmental interactions to perform well, and are typically computationally very expensive. Active inference is a model-based approach that directs agents to explore uncertain states while adhering to a prior model of their goal behaviour. This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency due to its objective function, which encourages directed exploration of uncertain states. Moreover, our model is computationally very light and can operate in a fully online manner while achieving comparable performance to offline RL methods. We showcase the capabilities of our model by solving the mountain car problem, where we demonstrate its superior exploration properties and its robustness to observation noise, which in fact improves performance. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives and improves performance over previous active inference approaches.

翻译：智能剂必须在复杂的环境中以部分信息和往往有限的计算能力追求其目标。强化学习方法已经取得了巨大成功,创建了优化设计奖励功能的代理,但往往难以在微薄的回报环境中学习,通常需要许多环境互动才能很好地运作,而且通常在计算上非常昂贵。积极的推论是一种基于模型的方法,它指导代理探索不确定状态,同时坚持其目标行为先前的模式。本文引入了一种积极的推论剂,将预期未来的新自由能量降到最低。我们的模型能够解决稀释问题,其抽样效率很高,因为它的客观功能鼓励直接探索不确定的状态。此外,我们的模型在计算上非常轻,可以完全在线运作,同时实现与离线RL方法的类似性能。我们通过解决山地汽车问题展示了模型的能力,在那里我们展示出其优异的勘探特性和对观测噪音的强大性能,这实际上提高了业绩。我们还引入了一种新的方法,将先前的模型与奖励功能相匹配,因为前者的模型简化了复杂目标的表达方式,改进了以往的性能。