进化强化学习:调查</s> (Evolutionary Reinforcement Learning: A Survey)

Reinforcement learning (RL) is a machine learning approach that trains agents to maximize cumulative rewards through interactions with environments. The integration of RL with deep learning has recently resulted in impressive achievements in a wide range of challenging tasks, including board games, arcade games, and robot control. Despite these successes, there remain several crucial challenges, including brittle convergence properties caused by sensitive hyperparameters, difficulties in temporal credit assignment with long time horizons and sparse rewards, a lack of diverse exploration, especially in continuous search space scenarios, difficulties in credit assignment in multi-agent reinforcement learning, and conflicting objectives for rewards. Evolutionary computation (EC), which maintains a population of learning agents, has demonstrated promising performance in addressing these limitations. This article presents a comprehensive survey of state-of-the-art methods for integrating EC into RL, referred to as evolutionary reinforcement learning (EvoRL). We categorize EvoRL methods according to key research fields in RL, including hyperparameter optimization, policy search, exploration, reward shaping, meta-RL, and multi-objective RL. We then discuss future research directions in terms of efficient methods, benchmarks, and scalable platforms. This survey serves as a resource for researchers and practitioners interested in the field of EvoRL, highlighting the important challenges and opportunities for future research. With the help of this survey, researchers and practitioners can develop more efficient methods and tailored benchmarks for EvoRL, further advancing this promising cross-disciplinary research field.

翻译：强化学习(RL)是一种机械学习方法,它使代理商能够通过与环境的相互作用,最大限度地获得累积的回报。将RL与深层次学习相结合,最近取得了令人印象深刻的成就,完成了一系列具有挑战性的任务,包括棋盘游戏、街游游戏和机器人控制等。尽管取得了这些成功,但仍存在若干重大挑战,包括:由于敏感的超光谱、长期时间跨度和微薄的奖励而导致的细小趋同特性,在时间分配临时信贷方面存在困难,缺乏多样化的探索,特别是在连续搜索空间情景方面,在多试剂强化学习中的信用分配困难,以及相互冲突的报酬目标。不断演变的计算(EC),它维持着一批学习代理商,在解决这些限制方面表现出了令人印象深刻的成绩。本文章对将EC纳入RL的最新方法进行了全面调查,称为进化强化学习(EvoRL)。我们将EvoRL方法按照R的主要研究领域,包括超光量度优化、政策搜索、勘探、奖励塑造、元-RL和多目标的RL。然后,我们讨论未来研究方向,从高效的方法、基准和可变的实地研究机会的实地评估,为这个重要的实地研究机会,强调重要的实地研究机会。</s>