Recently, Daskalakis, Fishelson, and Golowich (DFG) (NeurIPS`21) showed that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(\textrm{polylog}(T))$ after $T$ repetitions of the game. We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium at the rate of $\tilde{O}(T^{-1})$. This substantially improves over the prior best rate of convergence for correlated equilibria of $O(T^{-3/4})$ due to Chen and Peng (NeurIPS`20), and it is optimal -- within the no-regret framework -- up to polylogarithmic factors in $T$. To obtain these results, we develop new techniques for establishing higher-order smoothness for learning dynamics involving fixed point operations. Specifically, we establish that the no-internal-regret learning dynamics of Stoltz and Lugosi (Mach Learn`05) are equivalently simulated by no-external-regret dynamics on a combinatorial space. This allows us to trade the computation of the stationary distribution on a polynomial-sized Markov chain for a (much more well-behaved) linear transformation on an exponential-sized set, enabling us to leverage similar techniques as DFG to near-optimally bound the internal regret. Moreover, we establish an $O(\textrm{polylog}(T))$ no-swap-regret bound for the classic algorithm of Blum and Mansour (BM) (JMLR`07). We do so by introducing a technique based on the Cauchy Integral Formula that circumvents the more limited combinatorial arguments of DFG. In addition to shedding clarity on the near-optimal regret guarantees of BM, our arguments provide insights into the various ways in which the techniques by DFG can be extended and leveraged in the analysis of more involved learning algorithms.
翻译:最近, Dakalakis、 Fishelson 和 Golowich (DFG)(NeurIPS'21) 显示,如果多玩者一般和正常游戏中的所有代理商都采用最佳多倍增 Weights更新(OMWU),那么每个玩家的外部遗憾是$O(textrm{polylogy}(T) 重复游戏之后的$T$美元。我们把外部的遗憾扩大到内部的遗憾和互换(NeurIPS'21),从而建立不相交的学习链路流动力动力,以$(tilde{O}(T ⁇ -1}) 接近的直线性动态速度趋近。这大大改进了之前对Q(T ⁇ -3/4}) 平等离子(NeurIPS'20) 方法的合并率。在不重复游戏中,最理想的是,通过不重复的框架中, 以$(mlogyral) imaltial imal imal(t) dial-dal dival dal disal disal) dial disal disal disl) 在固定操作中,我们可以建立更高平点操作的平流(Sild) 。