We consider the problem of computing mixed Nash equilibria of two-player zero-sum games with continuous sets of pure strategies and with first-order access to the payoff function. This problem arises for example in game-theory-inspired machine learning applications, such as distributionally-robust learning. In those applications, the strategy sets are high-dimensional and thus methods based on discretisation cannot tractably return high-accuracy solutions. In this paper, we introduce and analyze a particle-based method that enjoys guaranteed local convergence for this problem. This method consists in parametrizing the mixed strategies as atomic measures and applying proximal point updates to both the atoms' weights and positions. It can be interpreted as a time-implicit discretization of the "interacting" Wasserstein-Fisher-Rao gradient flow. We prove that, under non-degeneracy assumptions, this method converges at an exponential rate to the exact mixed Nash equilibrium from any initialization satisfying a natural notion of closeness to optimality. We illustrate our results with numerical experiments and discuss applications to max-margin and distributionally-robust classification using two-layer neural networks, where our method has a natural interpretation as a simultaneous training of the network's weights and of the adversarial distribution.
翻译:我们考虑的是计算双球员零和游戏混合平衡的纳什混合平衡问题,它有一系列连续的纯战略,并有获得报酬功能的第一顺序。这个问题在游戏理论启发的机器学习应用中出现,例如分布式机器人学习。在这些应用中,战略组合是高维的,因此基于分化的方法无法轻易地返回高精度解决方案。在本文中,我们引入和分析一种基于粒子的方法,该方法可以保证当地对这一问题的趋同。这种方法包括将混合战略作为原子措施加以平衡,并对原子的重量和位置进行准点更新。它可以被解释为“互动”瓦塞斯坦-费舍-拉奥梯度流的不透明分时间化。我们证明,在非变性假设下,这种方法与精确混合的纳什平衡速度相趋同,从任何初始化满足自然接近性概念到最佳性。我们用数字实验和讨论应用到原子重量和位置的准点更新点更新点。它可以被解释为“互动”瓦列斯特-菲舍尔-劳梯-拉梯-斜度流流。我们用两种正态网络的正态分配方法,将这种自然和正态网络的正态分配。