衍生-免费加强学习:审查 (Derivative-Free Reinforcement Learning: A Review)

Reinforcement learning is about learning agent models that make the best sequential decisions in unknown environments. In an unknown environment, the agent needs to explore the environment while exploiting the collected information, which usually forms a sophisticated problem to solve. Derivative-free optimization, meanwhile, is capable of solving sophisticated problems. It commonly uses a sampling-and-updating framework to iteratively improve the solution, where exploration and exploitation are also needed to be well balanced. Therefore, derivative-free optimization deals with a similar core issue as reinforcement learning, and has been introduced in reinforcement learning approaches, under the names of learning classifier systems and neuroevolution/evolutionary reinforcement learning. Although such methods have been developed for decades, recently, derivative-free reinforcement learning exhibits attracting increasing attention. However, recent survey on this topic is still lacking. In this article, we summarize methods of derivative-free reinforcement learning to date, and organize the methods in aspects including parameter updating, model selection, exploration, and parallel/distributed methods. Moreover, we discuss some current limitations and possible future directions, hoping that this article could bring more attentions to this topic and serve as a catalyst for developing novel and efficient approaches.

翻译：强化学习是指在未知环境中做出最连续决定的学习代理模式;在未知环境中,代理机构需要探索环境,同时利用所收集的信息,而这些信息通常形成一个复杂的解决问题的问题;同时,无衍生工具优化能够解决复杂的问题;通常使用抽样和升级框架来迭接改进解决方案,其中勘探和开采也需要非常平衡;因此,无衍生工具优化处理与强化学习相似的核心问题,并在强化学习方法中采用,以学习分类系统和神经革命/革命强化学习为名;尽管这类方法已经发展了几十年,但最近出现了无衍生工具强化学习展览,引起越来越多的关注;然而,最近关于这一专题的调查仍然缺乏。在本篇文章中,我们总结了迄今为止无衍生工具强化学习的方法,并整理了包括参数更新、模式选择、探索以及平行/分配方法在内的各方面的方法。此外,我们讨论了当前的一些限制和可能的未来方向,希望这一文章能够引起对这一专题的更多关注,并成为制定新颖和高效方法的催化剂。