A prevalent limitation of optimizing over a single objective is that it can be misguided, becoming trapped in local optimum. This can be rectified by Quality-Diversity (QD) algorithms, where a population of high-quality and diverse solutions to a problem is preferred. Most conventional QD approaches, for example, MAP-Elites, explicitly manage a behavioral archive where solutions are broken down into predefined niches. In this work, we show that a diverse population of solutions can be found without the limitation of needing an archive or defining the range of behaviors in advance. Instead, we break down solutions into independently evolving species and use unsupervised skill discovery to learn diverse, high-performing solutions. We show that this can be done through gradient-based mutations that take on an information theoretic perspective of jointly maximizing mutual information and performance. We propose Diverse Quality Species (DQS) as an alternative to archive-based QD algorithms. We evaluate it over several simulated robotic environments and show that it can learn a diverse set of solutions from varying species. Furthermore, our results show that DQS is more sample-efficient and performant when compared to other QD algorithms. Relevant code and hyper-parameters are available at: https://github.com/rwickman/NEAT_RL.
翻译:通过多样优质物种实现高效的优质多样性优化
The translated abstract
优化单一目标的一个常见限制是可能会误导,陷入局部最优。这可以通过优质多样性(QD)算法来纠正,在此算法中,优先选择高质量且多样化的解决方案种群。大多数传统的QD方法,例如MAP-Elites,明确管理行为归档,其中解决方案被分解为预定义的小生境。在这项工作中,我们表明,可以在无需归档或预先定义行为范围的限制下找到多样化的解决方案种群。相反,我们将解决方案分解为独立演化的物种,并使用无监督的技能发现来学习多样化的高性能解决方案。我们表明,这可以通过采用信息理论角度的基于梯度的突变来完成,共同最大化相互信息和性能。我们提出多样优质物种(DQS)作为归档型QD算法的替代品。我们在几个模拟机器人环境中进行评估,并显示它可以从各种物种中学习多样化的解决方案。此外,我们的结果表明,在与其他QD算法相比时,DQS更具有样本效率和性能。相关的代码和超参数可在此链接获得: https://github.com/rwickman/NEAT_RL。