Recent progress in reinforcement learning (RL) has started producing generally capable agents that can solve a distribution of complex environments. These agents are typically tested on fixed, human-authored environments. On the other hand, quality diversity (QD) optimization has been proven to be an effective component of environment generation algorithms, which can generate collections of high-quality environments that are diverse in the resulting agent behaviors. However, these algorithms require potentially expensive simulations of agents on newly generated environments. We propose Deep Surrogate Assisted Generation of Environments (DSAGE), a sample-efficient QD environment generation algorithm that maintains a deep surrogate model for predicting agent behaviors in new environments. Results in two benchmark domains show that DSAGE significantly outperforms existing QD environment generation algorithms in discovering collections of environments that elicit diverse behaviors of a state-of-the-art RL agent and a planning agent.
翻译:最近在强化学习(RL)方面的进展已开始产生能够解决复杂环境分布的具有普遍能力的代理物,这些代理物通常在固定的、人为的环境中进行测试。另一方面,质量多样性(QD)优化被证明是环境生成算法的有效组成部分,这种算法可以收集出在由此产生的代理物行为上各不相同的高质量环境。然而,这些算法要求就新生成的环境对代理物进行可能昂贵的模拟。我们提议采用一种具有样本效率的QD环境生成算法,即维持一种用于预测新环境中的代理物行为的深度替代模型。两个基准领域的结果表明,DSAGE在发现环境收集中明显优于现有的QD环境生成算法,这些环境生成算法可以产生最先进的RL代理物和规划代理物的不同行为。