Multi-goal Reinforcement Learning has recently attracted a large amount of research interest. By allowing experience to be shared between related training tasks, this setting favors generalization for new tasks at test time, whenever some smoothness exists in the considered representation space of goals. However, in settings with discontinuities in state or goal spaces (e.g. walls in a maze), a majority of goals are difficult to reach, due to the sparsity of rewards in the absence of expert knowledge. This implies hard exploration, for which some curriculum of goals must be discovered, to help agents learn by adapting training tasks to their current capabilities. Building on recent automatic curriculum learning techniques for goal-conditioned policies, we propose a novel approach: Stein Variational Goal Generation (SVGG), which seeks at preferably sampling new goals in the zone of proximal development of the agent, by leveraging a learned model of its abilities, and a goal distribution modeled as particles in the exploration space. Our approach relies on Stein Variational Gradient Descent to dynamically attract the goal sampling distribution in areas of appropriate difficulty. We demonstrate the performances of the approach, in terms of success coverage in the goal space, compared to recent state-of-the-art RL methods for hard exploration problems.
翻译:最近,多目标强化学习引起了大量的研究兴趣。通过在相关培训任务之间交流经验,这一设置有利于试验时间对新任务进行概括化,只要在考虑的目标代表空间中存在某种平稳,只要在目标代表空间存在某种平稳,只要在州或目标空间(如迷宫中的墙壁)不连续的情况下,大多数目标难以实现,因为在缺乏专家知识的情况下奖励的广度很大,这意味着要进行艰苦的探索,必须发现一些目标课程,以帮助代理商通过调整培训任务使其适应其当前能力来学习。我们根据近期的自动课程学习技术,提出一种新的方法: " 斯坦·瓦里基尔目标生成 " (SVGG),通过利用其能力学得的模型和作为探索空间中的颗粒的模型,寻求最好采样该代理人准开发区内的新目标。我们的方法依靠斯坦·瓦里蒂里蒂尔·拉迪特布尔人,以动态的方式吸引在适当困难地区进行目标抽样分配。我们以最新探索方法的成功率展示了该方法的绩效。