In this paper we study two-player bilinear zero-sum games with constrained strategy spaces. An instance of natural occurrences of such constraints is when mixed strategies are used, which correspond to a probability simplex constraint. We propose and analyze the alternating mirror descent algorithm, in which each player takes turns to take action following the mirror descent algorithm for constrained optimization. We interpret alternating mirror descent as an alternating discretization of a skew-gradient flow in the dual space, and use tools from convex optimization and modified energy function to establish an $O(K^{-2/3})$ bound on its average regret after $K$ iterations. This quantitatively verifies the algorithm's better behavior than the simultaneous version of mirror descent algorithm, which is known to diverge and yields an $O(K^{-1/2})$ average regret bound. In the special case of an unconstrained setting, our results recover the behavior of alternating gradient descent algorithm for zero-sum games which was studied in (Bailey et al., COLT 2020).
翻译:在本文中,我们研究了具有限制战略空间的双球双线零和游戏。这种制约的自然发生实例是使用混合策略,这与概率简单度限制相对应。我们提出并分析交替反镜下游算法,其中每个玩家转而根据镜下游算法采取行动以优化限制优化。我们把交替反镜下游解释为双重空间中一个扭曲的分流,并使用来自二次优化和经修改的能量函数的工具,按美元外延后的平均遗憾来设定一个美元(K ⁇ -2/3})美元。这在数量上验证了算法的好于反镜下游算法的同步版本,该版本已知差异并产生一个美元(K ⁇ -1/2})平均遗憾。在未受控制的特殊情况下,我们的结果恢复了在研究的零和零和游戏中(Bailey等人,COLT,2020年)的交替梯梯下下基下游算法的行为。