Quality-Diversity algorithms, among which MAP-Elites, have emerged as powerful alternatives to performance-only optimisation approaches as they enable generating collections of diverse and high-performing solutions to an optimisation problem. However, they are often limited to low-dimensional search spaces and deterministic environments. The recently introduced Policy Gradient Assisted MAP-Elites (PGA-MAP-Elites) algorithm overcomes this limitation by pairing the traditional Genetic operator of MAP-Elites with a gradient-based operator inspired by Deep Reinforcement Learning. This new operator guides mutations toward high-performing solutions using policy-gradients. In this work, we propose an in-depth study of PGA-MAP-Elites. We demonstrate the benefits of policy-gradients on the performance of the algorithm and the reproducibility of the generated solutions when considering uncertain domains. We first prove that PGA-MAP-Elites is highly performant in both deterministic and uncertain high-dimensional environments, decorrelating the two challenges it tackles. Secondly, we show that in addition to outperforming all the considered baselines, the collections of solutions generated by PGA-MAP-Elites are highly reproducible in uncertain environments, approaching the reproducibility of solutions found by Quality-Diversity approaches built specifically for uncertain applications. Finally, we propose an ablation and in-depth analysis of the dynamic of the policy-gradients-based variation. We demonstrate that the policy-gradient variation operator is determinant to guarantee the performance of PGA-MAP-Elites but is only essential during the early stage of the process, where it finds high-performing regions of the search space.
翻译:质量差异算法(包括MAP-Eliites)已经成为只使用性能优化方法的强大替代方法,其中,埃利特人已经成为了只有性能优化方法的强大替代方法,因为这些方法能够收集多种高效的优化问题解决方案。然而,它们往往局限于低维搜索空间和确定性环境。最近推出的政策加速辅助MAP-ELites(PGA-MAP-Elites)算法(PGA-MAP-Elites)算法(PGA-MAP-Elites)克服了这一局限性,将MAP-Elites的传统遗传操作器与深层变异性学习的基于梯度的操作器相配对。这个新的操作器引导了使用政策升级法的方法向高绩效解决方案的转变。在这个工作中,我们建议对PGA-MA-MA-A-A-A-Elites进行深度分析,在考虑不确定性能变异性分析的过程中,我们只是提出一个高性能变性化的方法。