Evolutionary Reinforcement Learning (ERL) that applying Evolutionary Algorithms (EAs) to optimize the weight parameters of Deep Neural Network (DNN) based policies has been widely regarded as an alternative to traditional reinforcement learning methods. However, the evaluation of the iteratively generated population usually requires a large amount of computational time and can be prohibitively expensive, which may potentially restrict the applicability of ERL. Surrogate is often used to reduce the computational burden of evaluation in EAs. Unfortunately, in ERL, each individual of policy usually represents millions of weights parameters of DNN. This high-dimensional representation of policy has introduced a great challenge to the application of surrogates into ERL to speed up training. This paper proposes a PE-SAERL Framework to at the first time enable surrogate-assisted evolutionary reinforcement learning via policy embedding (PE). Empirical results on 5 Atari games show that the proposed method can perform more efficiently than the four state-of-the-art algorithms. The training process is accelerated up to 7x on tested games, comparing to its counterpart without the surrogate and PE.
翻译:应用进化分数优化深神经网络(DNN)政策重量参数的进化强化学习(ERL)应用进化分数优化深神经网络(EAs)政策被广泛视为传统强化学习方法的一种替代方法,然而,对迭代生成的人口的评价通常需要大量计算时间,而且可能过于昂贵,这可能会限制ERL的适用性。 代孕常常被用来减少EAs中评估的计算负担。 不幸的是,在ERL中,每个政策个体通常代表DNN数以百万计的重量参数。这种高度的政策表现对将代孕器应用到ERL以加速培训带来了巨大的挑战。本文提议了一个PE-SAERL框架, 首次通过政策嵌入(PE)使代孕辅助进化强化学习成为可能(PE) 。 5 Atari游戏的经验显示,拟议的方法可以比四种最先进的算法效率更高。在测试的游戏上加速到7x,比没有代孕和PE的对口。