探索深强化学习:全面调查 (Exploration in Deep Reinforcement Learning: A Comprehensive Survey)

Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant successes across a wide range of domains, including game AI, autonomous vehicles, robotics, and so on. However, DRL and deep MARL agents are widely known to be sample inefficient that millions of interactions are usually needed even for relatively simple problem settings, thus preventing the wide application and deployment in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how efficiently exploring the environment and collecting informative experiences that could benefit policy learning towards the optimal ones. This problem becomes more challenging in complex environments with sparse rewards, noisy distractions, long horizons, and non-stationary co-learners. In this paper, we conduct a comprehensive survey on existing exploration methods for both single-agent and multi-agent RL. We start the survey by identifying several key challenges to efficient exploration. Beyond the above two main branches, we also include other notable exploration methods with different ideas and techniques. In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks. According to our algorithmic and empirical investigation, we finally summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.

翻译：深强化学习(DRL)和深多剂强化学习(MARL)等一系列领域取得了重大成功,但众所周知,DRL和深MARL代理商的抽样效率低下,即使对于相对简单的问题环境,通常也需要数百万个互动,从而防止了在现实工业情景中的广泛应用和部署。一个瓶颈挑战是众所周知的勘探问题,即如何有效地探索环境和收集有助于政策学习的最佳方法的信息经验。在报酬稀少、噪音分散、视野长、非常态共读器等复杂环境中,这个问题变得更具有挑战性。在本文中,我们对单剂和多剂RL的现有勘探方法进行了全面调查,我们首先查明了有效勘探面临的几个关键挑战。除了上述两个主要分支外,我们还包括其它有不同想法和技术的显著探索方法。除了算法分析外,我们还对不同探索方法的不同方法进行了一些全面和统一的实证比较,这些方法涉及我们最终使用的共同探索方向和MARL基准的公开分析。