Advances in Reinforcement Learning (RL) have successfully tackled sample efficiency and overestimation bias. However, these methods often fall short of scalable performance. On the other hand, genetic methods provide scalability but depict hyperparameter sensitivity to evolutionary operations. We present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm. Our contributions are threefold; ESAC (1) abstracts exploration from exploitation by combining Evolution Strategies (ES) with Soft Actor-Critic (SAC), (2) provides dominant skill transfer between offsprings by making use of soft winner selections and genetic crossovers in hindsight and (3) improves hyperparameter sensitivity in evolutions using Automatic Mutation Tuning (AMT). AMT gradually replaces the entropy framework of SAC allowing the population to succeed at the task while acting as randomly as possible, without making use of backpropagation updates. On a range of challenging robot control tasks consisting of high-dimensional action spaces and sparse rewards, ESAC demonstrates improved performance and sample efficiency in comparison to the Maximum Entropy framework. ESAC demonstrates scalability comparable to ES on the basis of hardware resources and algorithm overhead. A complete implementation of ESAC with notes on reproducibility and videos can be found at the project website https://karush17.github.io/esac-web/.
翻译:强化学习(RL)的进展成功地解决了抽样效率和高估偏差问题,然而,这些方法往往不能达到可缩放性能。另一方面,遗传方法提供可缩放性,但能给进化操作带来超光度的敏感度。我们介绍了基于进化的Soft Acor-Critic(ESAC),一个可缩放的RL算法。我们的贡献有三重;ASC(1)通过将《进化战略》(ES)与Soft Acor-Critic (SAC)相结合,从开发中总结探索成果。(2)通过在后视中利用软赢家选择和遗传交叉技术,使后代之间的技能转让占优势地位。(3)利用自动调制图(AMT),提高进化操作中的超光度敏感度。AMT逐步取代SAC的酶框架,使民众在尽可能随机地采取行动的同时能够成功完成任务,同时不使用反向调整更新。关于由高度行动空间和稀薄回报构成的一系列具有挑战性的机器人控制任务,ESC显示与最完整框架相比,业绩和抽样效率有所提高。ESC在可比较的ESBROVSOVSOV17网站的硬件和可操作上找到。