Hanabi is a cooperative game that brings the problem of modeling other players to the forefront. In this game, coordinated groups of players can leverage pre-established conventions to great effect, but playing in an ad-hoc setting requires agents to adapt to its partner's strategies with no previous coordination. Evaluating an agent in this setting requires a diverse population of potential partners, but so far, the behavioral diversity of agents has not been considered in a systematic way. This paper proposes Quality Diversity algorithms as a promising class of algorithms to generate diverse populations for this purpose, and generates a population of diverse Hanabi agents using MAP-Elites. We also postulate that agents can benefit from a diverse population during training and implement a simple "meta-strategy" for adapting to an agent's perceived behavioral niche. We show this meta-strategy can work better than generalist strategies even outside the population it was trained with if its partner's behavioral niche can be correctly inferred, but in practice a partner's behavior depends and interferes with the meta-agent's own behavior, suggesting an avenue for future research in characterizing another agent's behavior during gameplay.
翻译:汉娜比是一个合作游戏,它把模拟其他玩家的问题引向了前沿。在这个游戏中,协调的玩家团体可以利用预先确立的公约,产生巨大的效果,但是在特别的环境下玩耍需要代理商适应其合伙人的战略,而没有事先的协调。在这个环境中评价一个代理商需要不同的潜在合伙人,但到目前为止,还没有系统地考虑代理人的行为多样性。本文件提出质量多样性算法,作为有希望的算法类别,为此目的产生不同的人口,并利用MAP-Elites来产生不同的汉娜比代理商。我们还假设,在培训期间,代理商可以从不同的人群中受益,并采用简单的“元战略”来适应代理人认为的行为优势。我们表明,这种元战略可以比一般的策略更好,即使是在被培训的人口之外,如果能够正确推断其合伙人的行为优势,但在实践中,一个合伙人的行为取决于并干扰元代理商本身的行为,因此建议在未来研究另一个代理商的行为特征时有一种途径。