Evolutionary algorithms have been used to evolve a population of actors to generate diverse experiences for training reinforcement learning agents, which helps to tackle the temporal credit assignment problem and improves the exploration efficiency. However, when adapting this approach to address constrained problems, balancing the trade-off between the reward and constraint violation is hard. In this paper, we propose a novel evolutionary constrained reinforcement learning (ECRL) algorithm, which adaptively balances the reward and constraint violation with stochastic ranking, and at the same time, restricts the policy's behaviour by maintaining a set of Lagrange relaxation coefficients with a constraint buffer. Extensive experiments on robotic control benchmarks show that our ECRL achieves outstanding performance compared to state-of-the-art algorithms. Ablation analysis shows the benefits of introducing stochastic ranking and constraint buffer.
翻译:进化算法已被用于进化一群能生成多样化经验以训练强化学习智能体的演员,这有助于解决时间信用分配问题并改善探索效率。但是,将这种方法适用于受限问题时,平衡奖励和约束违反之间的权衡是很困难的。在本文中,我们提出了一种新的进化约束强化学习(ECRL)算法,它通过随机排名自适应平衡奖励和约束违规,同时通过维护一组拉格朗日松弛系数和约束缓冲器来限制策略的行为。大量的机器人控制基准测试表明,我们的ECRL算法与最先进的算法相比具有出色的性能。消融分析表明了引入随机排名和约束缓冲器的好处。