Ad-hoc team cooperation is the problem of cooperating with other players that have not been seen in the learning process. Recently, this problem has been considered in the context of Hanabi, which requires cooperation without explicit communication with the other players. While in self-play strategies cooperating on reinforcement learning (RL) process has shown success, there is the problem of failing to cooperate with other unseen agents after the initial learning is completed. In this paper, we categorize the results of ad-hoc team cooperation into Failure, Success, and Synergy and analyze the associated failures. First, we confirm that agents learning via RL converge to one strategy each, but not necessarily the same strategy and that these agents can deploy different strategies even though they utilize the same hyperparameters. Second, we confirm that the larger the behavioral difference, the more pronounced the failure of ad-hoc team cooperation, as demonstrated using hierarchical clustering and Pearson correlation. We confirm that such agents are grouped into distinctly different groups through hierarchical clustering, such that the correlation between behavioral differences and ad-hoc team performance is -0.978. Our results improve understanding of key factors to form successful ad-hoc team cooperation in multi-player games.
翻译:Ad-hoc团队合作是与其他在学习过程中没有看到的其他参与者合作的问题。最近,这个问题在Hanabi的背景下得到了考虑,这要求在没有与其他参与者进行明确沟通的情况下进行合作。在加强学习(RL)进程合作的自玩战略中,尽管在强化学习(RL)进程上表现出成功,但存在在初始学习完成后没有与其他无形代理人合作的问题。在本文件中,我们将特设团队合作的结果分为失败、成功和协同,并分析相关的失败。首先,我们确认通过RL学习的代理商每个学习一个战略,但不一定是同一个战略,这些代理商可以部署不同的战略,即使他们使用同样的超参数。第二,我们确认,行为差异越大,通过等级组合和Pearson相关关系表明,特设团队合作的失败就更明显了。我们确认,这些代理商通过等级组合分为截然不同的群体,因此行为差异和特设团队业绩之间的相互关系是-0.978。我们的结果提高了对成功形成多场游戏团队合作的关键因素的理解。</s>