Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field.
翻译:强化学习是一种机器学习方法,通过与环境的交互,训练智能体最大化累积奖励。最近将强化学习与深度学习相结合,已在各种具有挑战性的任务中取得了令人瞩目的成就,包括棋盘游戏、街机游戏和机器人控制。尽管取得了这些成就,仍存在一些关键的挑战,包括由敏感超参数引起的脆弱收敛特性,长时间范围和稀疏奖励的困难的时间性信用分配,特别是在连续搜索空间场景中的多样化探索的缺乏,多智能体强化学习中的信用分配困难,以及奖励目标的冲突。演化计算(EC)维护一个学习代理的群体,已经展示了在应对这些限制方面的有前途的性能。本文对将EC集成到RL中的最新方法进行了全面调查,称为演化强化学习(EvoRL)。我们根据RL中的关键研究领域对EvoRL方法进行分类,包括超参数优化,策略搜索,探索,奖励塑形,元RL和多目标RL。然后,我们讨论未来的研究方向,涉及高效的方法,基准测试和可扩展平台。这项调研为对EvoRL领域感兴趣的研究人员和从业者提供了资源,强调了未来研究的重要挑战和机会。借助这项调查,研究人员和从业者可以开发更有效的方法和量身定制的EvoRL基准测试,进一步推进这一有前途的跨学科研究领域。