The human intrinsic desire to pursue knowledge, also known as curiosity, is considered essential in the process of skill acquisition. With the aid of artificial curiosity, we could equip current techniques for control, such as Reinforcement Learning, with more natural exploration capabilities. A promising approach in this respect has consisted of using Bayesian surprise on model parameters, i.e. a metric for the difference between prior and posterior beliefs, to favour exploration. In this contribution, we propose to apply Bayesian surprise in a latent space representing the agent's current understanding of the dynamics of the system, drastically reducing the computational costs. We extensively evaluate our method by measuring the agent's performance in terms of environment exploration, for continuous tasks, and looking at the game scores achieved, for video games. Our model is computationally cheap and compares positively with current state-of-the-art methods on several problems. We also investigate the effects caused by stochasticity in the environment, which is often a failure case for curiosity-driven agents. In this regime, the results suggest that our approach is resilient to stochastic transitions.
翻译:人类追求知识的内在愿望,也称为好奇心,被认为是获取技能过程中不可或缺的。在人工好奇心的帮助下,我们可以用更自然的勘探能力来装备现有的控制技术,如加强学习等。在这方面,一个很有希望的方法是,在模型参数上使用巴伊西亚惊喜,即衡量先前信仰和后世信仰之间的差别的尺度,以利探索。在这个贡献中,我们提议在代表代理人目前对系统动态的了解的潜伏空间里应用巴耶西亚惊喜,大幅度降低计算成本。我们通过测量代理人在环境探索、持续任务和看所达到的游戏分数方面的表现,对方法进行了广泛的评价。我们的模型在计算上是廉价的,与目前对若干问题采用的最新方法相比是积极的。我们还调查了环境中的随机性所造成的影响,而对于由好奇心驱动的物剂来说,这往往是个失败的例子。在这个制度下,结果显示我们的方法是适应随机性转变的。