Recently, Optimistic Multiplicative Weights Update (OMWU) was proven to be the first constant step-size algorithm in the online no-regret framework to enjoy last-iterate convergence to Nash Equilibria in the constrained zero-sum bimatrix case, where weights represent the probabilities of playing pure strategies. We introduce the second such algorithm, \textit{Consensus MWU}, for which we prove local convergence and show empirically that it enjoys faster and more robust convergence than OMWU. Our algorithm shows the importance of a new object, the \textit{simplex Hessian}, as well as of the interaction of the game with the (eigen)space of vectors summing to zero, which we believe future research can build on. As for OMWU, CMWU has convergence guarantees in the zero-sum case only, but Cheung and Piliouras (2020) recently showed that OMWU and MWU display opposite convergence properties depending on whether the game is zero-sum or cooperative. Inspired by this work and the recent literature on learning to optimize for single functions, we extend CMWU to non zero-sum games by introducing a new framework for online learning in games, where the update rule's gradient and Hessian coefficients along a trajectory are learnt by a reinforcement learning policy that is conditioned on the nature of the game: \textit{the game signature}. We construct the latter using a new canonical decomposition of two-player games into eight components corresponding to commutative projection operators, generalizing and unifying recent game concepts studied in the literature. We show empirically that our new learning policy is able to exploit the game signature across a wide range of game types.
翻译:最近,乐观的多复制式 Weights Udate (OMWU) 被证明是在线无记录游戏框架中的第一个不变的渐进级算法, 以在限制的零和二进制案例中享受与Nash Equilibria的最后趋同, 重量代表着玩纯战略的概率。 我们引入了第二个这样的算法, 即\ textit{Consensus MWU}, 我们证明它与当地趋同, 并用经验显示它比OMWU更快速和更牢固的趋同。 我们的算法显示一个新的对象的重要性, 即Textit{sopexx Hessian} 以及游戏和(igen) 矢量向零的矢量向零的矢量向的(igen)空间进行互动。 我们认为, 重量代表着未来研究的概率。 至于 OMWU, CMWU 的趋同只保证零和 Piliouras (202020) 最近显示, OMWU 和MWU能够显示反正统化的趋同性趋同性趋同性特性, 取决于游戏是零和合作性的游戏。 。 。 由本次游戏的游戏的游戏的轨迹学研究和最近的一个实验性游戏 更新的理论化的游戏 学习 学习 向后期中, 向后向后向后向后向后向后向后向后向后向后向后向后向后向的变现 。