In the natural world, life has found innumerable ways to survive and often thrive. Between and even within species, each individual is in some manner unique, and this diversity lends adaptability and robustness to life. In this work, we aim to learn a space of diverse and high-reward policies on any given environment. To this end, we introduce a generative model of policies, which maps a low-dimensional latent space to an agent policy space. Our method enables learning an entire population of agent policies, without requiring the use of separate policy parameters. Just as real world populations can adapt and evolve via natural selection, our method is able to adapt to changes in our environment solely by selecting for policies in latent space. We test our generative model's capabilities in a variety of environments, including an open-ended grid-world and a two-player soccer environment. Code, visualizations, and additional experiments can be found at https://kennyderek.github.io/adap/.
翻译:在自然界,生命已经找到了无数生存和经常繁荣的方法。在物种之间,甚至在物种内部,每个人都以某种方式具有独特性,这种多样性为生命提供了适应性和强健性。在这项工作中,我们的目标是在任何特定环境中学习多样化和高回报政策的空间。为此,我们引入了一种基因化的政策模型,为代理政策空间绘制了低维潜伏空间图。我们的方法使整个代理政策群得以学习,而无需使用不同的政策参数。正如现实世界人口可以通过自然选择来适应和演变,我们的方法只能通过选择潜藏空间的政策来适应我们环境的变化。我们测试我们的基因模型在各种环境中的能力,包括开放的网格世界和双人足球环境。可以在https://kenyderek.github.io/adap/上找到代码、可视化和额外实验。