Promoting behavioural diversity is critical for solving games with non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). Yet, there is a lack of rigorous treatment for defining diversity and constructing diversity-aware learning dynamics. In this work, we offer a geometric interpretation of behavioural diversity in games and introduce a novel diversity metric based on determinantal point processes (DPP). By incorporating the diversity metric into best-response dynamics, we develop diverse fictitious play and diverse policy-space response oracle for solving normal-form games and open-ended games. We prove the uniqueness of the diverse best response and the convergence of our algorithms on two-player games. Importantly, we show that maximising the DPP-based diversity metric guarantees to enlarge the gamescape -- convex polytopes spanned by agents' mixtures of strategies. To validate our diversity-aware solvers, we test on tens of games that show strong non-transitivity. Results suggest that our methods achieve at least the same, and in most games, lower exploitability than PSRO solvers by finding effective and diverse strategies.
翻译:在战略周期存在的情况下,促进行为多样性对于解决具有非短暂性动态的游戏至关重要,而且没有一贯的赢家(例如,摇滚-纸-剪刀)。然而,在定义多样性和构建多样性意识学习动态方面缺乏严格的处理方法。在这项工作中,我们对游戏中的行为多样性提供几何解释,并根据决定性点进程(DPP)引入新的多样性指标。通过将多样性指标纳入最佳反应动态,我们开发了多样化的虚拟游戏和多种政策空间反应或奇迹,以解决正常形式游戏和开放式游戏。我们证明了不同的最佳反应的独特性,以及我们两种玩家游戏的算法的趋同性。重要的是,我们展示了基于DPP的多样化指标保障最大化,以扩大游戏的场景 -- -- 由代理人的组合组合所跨越的组合。为了验证我们的多样性认识解答器,我们测试了显示强烈不透明的数十种游戏。结果显示,我们的方法至少实现了相同的,而且大多数游戏中,比PSRO解算器更差。