Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can address. Hence, we propose to discover effective update rules for evolution strategies via meta-learning. Concretely, our approach employs a search strategy parametrized by a self-attention-based architecture, which guarantees the update rule is invariant to the ordering of the candidate solutions. We show that meta-evolving this system on a small set of representative low-dimensional analytic optimization problems is sufficient to discover new evolution strategies capable of generalizing to unseen optimization problems, population sizes and optimization horizons. Furthermore, the same learned evolution strategy can outperform established neuroevolution baselines on supervised and continuous control tasks. As additional contributions, we ablate the individual neural network components of our method; reverse engineer the learned strategy into an explicit heuristic form, which remains highly competitive; and show that it is possible to self-referentially train an evolution strategy from scratch, with the learned update rule used to drive the outer meta-learning loop.
 翻译:无法获取梯度的优化功能是进化战略等黑箱方法的范畴。 虽然非常笼统, 他们的学习动态往往是时间繁琐和不灵活的, 恰恰是元学习能够解决的局限性。 因此, 我们提议通过元学习来发现进化战略的有效更新规则。 具体地说, 我们的方法使用一种由基于自我注意的架构加以平衡的搜索战略, 保证更新规则与候选人解决方案的排序不一致。 我们显示, 在一组有代表性的低度低度解析优化问题上, 元系统正在演变, 足以发现新的进化战略, 能够普及到看不见的优化问题、 人口大小和优化视野。 此外, 同样的进化战略可以超越监管和持续控制任务上已经确立的神经进化基线。 作为额外贡献, 我们将我们方法的各个神经网络组件升级为明确的超音率形式; 将所学战略反向进化成一种清晰的超度形式, 仍然具有高度竞争力; 并表明, 有可能通过学习性更新规则, 将进进化战略从零入, 进行自我偏向式培训, 以推进外向循环学习。