Inspired by the great success of unsupervised learning in Computer Vision and Natural Language Processing, the Reinforcement Learning community has recently started to focus more on unsupervised discovery of skills. Most current approaches, like DIAYN or DADS, optimize some form of mutual information objective. We propose a different approach that uses reward functions encoded by neural networks. These are trained iteratively to reward more complex behavior. In high-dimensional robotic environments our approach learns a wide range of interesting skills including front-flips for Half-Cheetah and one-legged running for Humanoid. In the pixel-based Montezuma's Revenge environment our method also works with minimal changes and it learns complex skills that involve interacting with items and visiting diverse locations. A web version of this paper which shows animations for the different skills is available in https://as.inf.ethz.ch/research/open_ended_RL/main.html
翻译:在计算机视野和自然语言处理的无监督学习的伟大成功激励下,加强学习社区最近开始更加注重在不受监督的情况下发现技能。大多数目前的方法,如DIAYN或DADS,优化了某种形式的相互信息目标。我们提出了使用由神经网络编码的奖励功能的不同方法。这些方法经过迭代培训,以奖励更复杂的行为。在高维机器人环境中,我们的方法学习了广泛的有趣技能,包括半切塔前翻和为人类运行的单腿。在以像素为基础的蒙特祖马的复仇环境中,我们的方法也以最小的变化运作,并学习了涉及与项目互动和访问不同地点的复杂技能。本文的网络版本显示不同技能的动画,可在https://as.inf.ethz.ch/research/open_end_RL/main.html查阅。