Recent renewed interest in multi-agent reinforcement learning (MARL) has generated an impressive array of techniques that leverage deep reinforcement learning, primarily actor-critic architectures, and can be applied to a limited range of settings in terms of observability and communication. However, a continuing limitation of much of this work is the curse of dimensionality when it comes to representations based on joint actions, which grow exponentially with the number of agents. In this paper, we squarely focus on this challenge of scalability. We apply the key insight of action anonymity, which leads to permutation invariance of joint actions, to two recently presented deep MARL algorithms, MADDPG and IA2C, and compare these instantiations to another recent technique that leverages action anonymity, viz., mean-field MARL. We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method, using a recently introduced pragmatic domain.
翻译:最近对多试剂加固学习(MARL)的重新关注产生了一系列令人印象深刻的技术,这些技术利用了深度加固学习,主要是行为者-批评结构,在可观察性和通信性方面可以应用于有限的各种环境。然而,这项工作中的大部分工作继续受到限制,这是在基于联合行动的表达方式方面出现的多元性的诅咒,随着代理人数量的增加而成倍增长。在本文件中,我们正集中关注这一可扩缩性的挑战。我们运用了行动匿名的关键洞察力,这导致联合行动的变异,最近提出的两种深度MARL算法,即MADDPG和IA2C, 并将这些即时性与最近利用行动匿名(即平均场MARL)的另一种最新技术进行比较。我们表明,我们的即时力能够利用最近引进的实用领域,在比平均场方法更广泛的一类代理网络中学习最佳行为。