Team competition in multi-agent Markov games is an increasingly important setting for multi-agent reinforcement learning, due to its general applicability in modeling many real-life situations. Multi-agent actor-critic methods are the most suitable class of techniques for learning optimal policies in the team competition setting, due to their flexibility in learning agent-specific critic functions, which can also learn from other agents. In many real-world team competitive scenarios, the roles of the agents naturally emerge, in order to aid in coordination and collaboration within members of the teams. However, existing methods for learning emergent roles rely heavily on the Q-learning setup which does not allow learning of agent-specific Q-functions. In this paper, we propose RAC, a novel technique for learning the emergent roles of agents within a team that are diverse and dynamic. In the proposed method, agents also benefit from predicting the roles of the agents in the opponent team. RAC uses the actor-critic framework with role encoder and opponent role predictors for learning an optimal policy. Experimentation using 2 games demonstrates that the policies learned by RAC achieve higher rewards than those learned using state-of-the-art baselines. Moreover, experiments suggest that the agents in a team learn diverse and opponent-aware policies.
翻译:多试剂Markov游戏中的团队竞争是多试剂强化学习的一个日益重要的环境,因为它在模拟许多现实生活情况时具有普遍适用性。多试剂行为者-批评方法是最适合在团队竞争环境中学习最佳政策的技术类别,因为它们在学习代理人特有的批评功能方面具有灵活性,也可以从其他代理中学习。在许多现实世界团队的竞争性情景中,代理人的作用自然出现,以协助小组成员内部的协调与合作。然而,现有的学习新兴角色的方法严重依赖Q学习设置,这种设置不允许学习代理人特有的功能。在本文件中,我们提出RAC,这是学习代理人在团队中不同而动态的新角色的一种新颖技术。在拟议方法中,代理人还受益于对对手团队中代理人作用的预测。RAC使用具有角色编码器和对手角色预测器的行为体框架学习最佳政策。使用2个游戏的实验表明,RAC所学的政策比那些学习过不同代理人的员工队伍中学习了更高奖状,建议采用州际实验室基线。