Contemporary applications of machine learning in two-team e-sports and the superior expressivity of multi-agent generative adversarial networks raise important and overlooked theoretical questions regarding optimization in two-team games. Formally, two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents, each experiencing a utility identical to that of their teammates and opposite to that of the opposing team. We focus on the solution concept of Nash equilibria (NE). We first show that computing NE for this class of games is $\textit{hard}$ for the complexity class ${\mathrm{CLS}}$. To further examine the capabilities of online learning algorithms in games with full-information feedback, we propose a benchmark of a simple -- yet nontrivial -- family of such games. These games do not enjoy the properties used to prove convergence for relevant algorithms. In particular, we use a dynamical systems perspective to demonstrate that gradient descent-ascent, its optimistic variant, optimistic multiplicative weights update, and extra gradient fail to converge (even locally) to a Nash equilibrium. On a brighter note, we propose a first-order method that leverages control theory techniques and under some conditions enjoys last-iterate local convergence to a Nash equilibrium. We also believe our proposed method is of independent interest for general min-max optimization.
翻译:当代机器学习在两队电子竞技领域中的应用以及多代理生成对抗网络的高级表达性质,引发了关于两队博弈优化的重要而被忽视的理论问题。形式上,两队零和博弈是指多种玩家游戏,其中玩家被分成两支竞争组,每个组的效用和队友相同,和对立队伍相反。重点关注博弈解答概念的纳什均衡(NE)。首先,证明计算此类博弈的NE对于复杂度级别$CLS$ 是困难的。为了进一步检验具有全信息反馈的游戏中在线学习算法的性能,我们提出了一个简单但非平凡的这类游戏的基准测试。这些游戏没有用于证明相关算法收敛性的属性。特别地,我们使用动力系统的角度证明,梯度下降-上升、其乐观变体、乐观乘法权重更新和额外梯度均无法收敛到NE(即使是局部收敛)。好消息是,我们提出了一种利用控制理论技术的一阶方法,并在一些条件下具有局部收敛到NE的性质。我们相信我们提出的方法对于一般的min-max优化问题也很有独立的兴趣。