政策操纵搜索:探索基于多样性的神经革命的虚伪理论 (Policy Manifold Search: Exploring the Manifold Hypothesis for Diversity-based Neuroevolution)

from arxiv, Accepted as a full paper at Genetic and Evolutionary Computation Conference, GECCO 2021. arXiv admin note: substantial text overlap with arXiv:2012.08676

Neuroevolution is an alternative to gradient-based optimisation that has the potential to avoid local minima and allows parallelisation. The main limiting factor is that usually it does not scale well with parameter space dimensionality. Inspired by recent work examining neural network intrinsic dimension and loss landscapes, we hypothesise that there exists a low-dimensional manifold, embedded in the policy network parameter space, around which a high-density of diverse and useful policies are located. This paper proposes a novel method for diversity-based policy search via Neuroevolution, that leverages learned representations of the policy network parameters, by performing policy search in this learned representation space. Our method relies on the Quality-Diversity (QD) framework which provides a principled approach to policy search, and maintains a collection of diverse policies, used as a dataset for learning policy representations. Further, we use the Jacobian of the inverse-mapping function to guide the search in the representation space. This ensures that the generated samples remain in the high-density regions, after mapping back to the original space. Finally, we evaluate our contributions on four continuous-control tasks in simulated environments, and compare to diversity-based baselines.

翻译：神经革命是基于梯度的优化的替代方法,它有可能避免本地迷你,并允许平行化。主要限制因素是,它通常与参数空间维度不相适应。在近期对神经网络内在层面和损失地貌进行的研究的启发下,我们假设政策网络参数空间内存在一个低维的多元,嵌入于政策网络参数空间,其周围是多种和有用政策的高度密度。本文建议了一种通过神经革命进行基于多样性的政策搜索的新方法,该方法通过在这个学习的演示空间进行政策搜索,利用政策网络参数的学习表现。我们的方法依赖于质量多样性框架,该框架为政策搜索提供了有原则的方法,并保持了一套不同的政策,用作学习政策表述的数据集。此外,我们使用反映射功能的雅各布来指导在代表空间的搜索。这确保生成的样本在测绘回到原始空间之后,仍然留在高密度区域。最后,我们评估了我们在模拟环境中的四项持续控制任务方面的贡献,对多样性进行比较。