ES-ENAS:将进化战略与神经结构搜索相结合,不增加加强学习费用 (ES-ENAS: Combining Evolution Strategies with Neural Architecture Search at No Extra Cost for Reinforcement Learning)

from arxiv, 14 pages. This is an updated version of a previous submission which can be found at arXiv:1907.06511. See https://github.com/google-research/google-research/tree/master/es_enas for associated code

We introduce ES-ENAS, a simple neural architecture search (NAS) algorithm for the purpose of reinforcement learning (RL) policy design, by combining Evolutionary Strategies (ES) and Efficient NAS (ENAS) in a highly scalable and intuitive way. Our main insight is noticing that ES is already a distributed blackbox algorithm, and thus we may simply insert a model controller from ENAS into the central aggregator in ES and obtain weight sharing properties for free. By doing so, we bridge the gap from NAS research in supervised learning settings to the reinforcement learning scenario through this relatively simple marriage between two different lines of research, and are one of the first to apply controller-based NAS techniques to RL. We demonstrate the utility of our method by training combinatorial neural network architectures for RL problems in continuous control, via edge pruning and weight sharing. We also incorporate a wide variety of popular techniques from modern NAS literature, including multiobjective optimization and varying controller methods, to showcase their promise in the RL field and discuss possible extensions. We achieve >90% network compression for multiple tasks, which may be special interest in mobile robotics with limited storage and computational resources.

翻译：我们引入了ES-ENAS, 这是一种简单的神经结构搜索算法(NAS),目的是加强学习(RL)政策设计,将进化战略(ES)和高效NAS(ENAS)相结合,以高度可伸缩和直观的方式,将进化战略(ES)和高效NAS(ENAS)结合起来。我们的主要洞察力发现ESS已经是一个分散的黑盒算法,因此我们可以简单地将ENAS的模型控制器插入ES-ENAS的中央聚合器,并免费获得权重共享属性。这样,我们就缩小了NAS在受监督的学习环境中的研究与强化学习情景之间的差距,通过两种不同研究线之间的相对简单的结合,将NAS技术应用到RL。我们首先将基于控制器的NAS技术应用到RL。我们的方法的效用在于通过边缘剪裁和重量共享来培训用于持续控制的RL的组合神经网络结构。我们还将来自现代NAS文学的大量流行技术,包括多目标优化和不同的控制方法,以在RL领域展示其承诺并讨论可能的扩展。我们实现了超过90%的网络对多种任务进行压缩,这可能会成为移动机器人的特别的兴趣。