等级深层多剂强化学习 (Hierarchical Deep Multiagent Reinforcement Learning)

Despite deep reinforcement learning has recently achieved great successes, however in multiagent environments, a number of challenges still remain. Multiagent reinforcement learning (MARL) is commonly considered to suffer from the problem of non-stationary environments and exponentially increasing policy space. It would be even more challenging to learn effective policies in circumstances where the rewards are sparse and delayed over long trajectories. In this paper, we study Hierarchical Deep Multiagent Reinforcement Learning (hierarchical deep MARL) in cooperative multiagent problems with sparse and delayed rewards, where efficient multiagent learning methods are desperately needed. We decompose the original MARL problem into hierarchies and investigate how effective policies can be learned hierarchically in synchronous/asynchronous hierarchical MARL frameworks. Several hierarchical deep MARL architectures, i.e., Ind-hDQN, hCom and hQmix, are introduced for different learning paradigms. Moreover, to alleviate the issues of sparse experiences in high-level learning and non-stationarity in multiagent settings, we propose a new experience replay mechanism, named as Augmented Concurrent Experience Replay (ACER). We empirically demonstrate the effects and efficiency of our approaches in several classic Multiagent Trash Collection tasks, as well as in an extremely challenging team sports game, i.e., Fever Basketball Defense.

翻译：尽管最近深层强化学习取得了巨大成功,但在多试剂环境中,仍然存在一些挑战。多试剂强化学习(MARL)通常被认为受到非静止环境和急剧增加的政策空间问题的影响。在奖赏稀少、长期轨迹拖延的情况下,学习有效政策将更加困难。在本论文中,我们研究了高层次深层多试剂强化学习(高层次深层MARL)在合作性多试剂问题中缺乏丰富和延迟的奖励,在多试剂环境中迫切需要有效的多试剂学习方法。我们把最初的MARL问题分解成等级,并研究如何在同步/不同步的MARL框架下从上学习有效政策。一些高层次的MARL结构,即Ind-HDQN、hCom和hQmix,被引入不同的学习模式。此外,为了减轻高层次学习和不固定的多试剂环境中的少经验问题,我们提出了一个新的经验再扮演机制,称为高清晰的国防经验重塑性游戏,我们作为极具挑战性的体育竞赛团队。我们的经验展示了一种极具挑战性的工作效率。