We introduce a generalization of zero-sum network multiagent matrix games and prove that alternating gradient descent converges to the set of Nash equilibria at rate $O(1/T)$ for this set of games. Alternating gradient descent obtains this convergence guarantee while using fixed learning rates that are four times larger than the optimistic variant of gradient descent. Experimentally, we show with 97.5% confidence that, on average, these larger learning rates result in time-averaged strategies that are 2.585 times closer to the set of Nash equilibria than optimistic gradient descent.
翻译:我们引入了零和网络多试剂矩阵游戏的普遍化,并证明交替的梯度下降与一套纳什平衡一致,这一组游戏的汇率为O(1/T)美元。 交替的梯度下降获得这一趋同保证,同时使用比乐观的梯度下降变数高四倍的固定学习率。 实验中,我们以97.5%的把握显示,平均而言,这些较大的学习率导致时间平均战略,比乐观的梯度下降率高出2.585倍。