In this paper we establish efficient and \emph{uncoupled} learning dynamics so that, when employed by all players in a general-sum multiplayer game, the \emph{swap regret} of each player after $T$ repetitions of the game is bounded by $O(\log T)$, improving over the prior best bounds of $O(\log^4 (T))$. At the same time, we guarantee optimal $O(\sqrt{T})$ swap regret in the adversarial regime as well. To obtain these results, our primary contribution is to show that when all players follow our dynamics with a \emph{time-invariant} learning rate, the \emph{second-order path lengths} of the dynamics up to time $T$ are bounded by $O(\log T)$, a fundamental property which could have further implications beyond near-optimally bounding the (swap) regret. Our proposed learning dynamics combine in a novel way \emph{optimistic} regularized learning with the use of \emph{self-concordant barriers}. Further, our analysis is remarkably simple, bypassing the cumbersome framework of higher-order smoothness recently developed by Daskalakis, Fishelson, and Golowich (NeurIPS'21).
翻译:在本文中,我们建立了高效的学习动态,这样,当所有玩家在普通和多玩家游戏中使用了所有玩家在游戏重复美元后,每个玩家的 emph{swap regret} 受美元(logT) 约束, 超过先前最好的O(log4)(T) 美元界限的改善。 同时, 我们保证在对抗制中也得到最佳的 $( sqrt{T}) 互换遗憾。 为了获得这些结果, 我们的主要贡献是表明, 当所有玩家在游戏重复美元后, 每个玩家在游戏重复美元之后的\ emph{ time- evilant} 学习时, 每个玩家的\ emph{ secon- road path lates} 都受美元( logT) 的束缚。 $( 4) 美元(log) 4 (T) 。 同时,我们保证在对抗制制度下, 最接近于最接近的束缚( swap) IP(s) rb) 。 我们提议的学习动态以新的方式结合了 emph{opimmextimedictime 的方式 。