How can artificial agents learn to solve many diverse tasks in complex visual environments in the absence of any supervision? We decompose this question into two problems: discovering new goals and learning to reliably achieve them. We introduce Latent Explorer Achiever (LEXA), a unified solution to these that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts. Unlike prior methods that explore by reaching previously visited states, the explorer plans to discover unseen surprising states through foresight, which are then used as diverse targets for the achiever to practice. After the unsupervised phase, LEXA solves tasks specified as goal images zero-shot without any additional learning. LEXA substantially outperforms previous approaches to unsupervised goal-reaching, both on prior benchmarks and on a new challenging benchmark with a total of 40 test tasks spanning across four standard robotic manipulation and locomotion domains. LEXA further achieves goals that require interacting with multiple objects in sequence. Finally, to demonstrate the scalability and generality of LEXA, we train a single general agent across four distinct environments. Code and videos at https://orybkin.github.io/lexa/
翻译:人工代理人如何在没有监督的情况下在复杂的视觉环境中学会在没有监督的情况下在复杂的视觉环境中解决许多不同的任务? 我们将这一问题分解成两个问题: 发现新的目标并学习可靠地实现这些目标。 我们引入了“ 冷藏探索者”, 这是从图像输入中学习世界模型的统一解决方案, 并用它来培训一个探索者, 和从想象的推出中学习一个“ 实现者” 政策。 与以前通过到达以前访问过的国家来探索的方法不同, 探索者计划通过预见来发现不可见的令人惊奇的国家, 这些国家随后被用作实现者所要实践的不同目标。 在未受监督的阶段之后, LEXA 在没有任何额外学习的情况下解决了指定为目标图像零照的任务。 LEXA 大大优于以前在前几个基准和新的具有挑战性的基准上, 共40项测试任务, 跨越了四个标准的机器人操纵和定位区域。 LEXA 进一步实现了需要与多个天体按顺序进行互动的目标。 最后, 我们训练了四个不同环境的单一普通代理。 代码 和视频 。 http:// http:// abrivelistria/ volvas。