Training with Reinforcement Learning requires a reward function that is used to guide the agent towards achieving its objective. However, designing smooth and well-behaved rewards is in general not trivial and requires significant human engineering efforts. Generating rewards in self-supervised way, by inspiring the agent with an intrinsic desire to learn and explore the environment, might induce more general behaviours. In this work, we propose a curiosity-based bonus as intrinsic reward for Reinforcement Learning, computed as the Bayesian surprise with respect to a latent state variable, learnt by reconstructing fixed random features. We extensively evaluate our model by measuring the agent's performance in terms of environment exploration, for continuous tasks, and looking at the game scores achieved, for video games. Our model is computationally cheap and empirically shows state-of-the-art performance on several problems. Furthermore, experimenting on an environment with stochastic actions, our approach emerged to be the most resilient to simple stochasticity. Further visualization is available on the project webpage.(https://lbsexploration.github.io/)
翻译:强化学习培训要求一种奖励功能,用来指导代理人实现其目标。然而,设计顺畅和守法的奖励通常不是微不足道的,需要大量的人力工程努力。以自我监督的方式创造奖励,激励代理人有学习和探索环境的内在愿望,从而激发更普遍的行为。在这项工作中,我们提出一种基于好奇的奖励,作为强化学习的内在奖励,作为贝叶西亚人对通过重建固定随机功能而学会的潜在状态变数的惊喜计算。我们通过测量代理人在环境探索、持续任务和看所完成的游戏分数方面的表现,对我们的模型进行了广泛的评价。我们的模型在计算上廉价,在经验上展示了几个问题上的最新表现。此外,在环境上以随机行动进行实验,我们的方法变得最能适应简单的随机特性。在项目网页上可以找到进一步的可视化。 (https://lbsexplorationation.github.io/)