In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the \emph{trigger regret} of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of \emph{extensive-form correlated equilibria} and \emph{coarse correlated equilibria} at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with \emph{polynomial degree}, a property that we establish for the fixed points of \emph{(coarse) trigger deviation functions}. Moreover, our construction leverages a refined \textit{regret circuit} for the convex hull, which -- unlike prior guarantees -- preserves the \emph{RVU property} introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.
翻译:在本文中,我们建立了高效且不混杂的学习动态,这样,当所有玩家在多玩家完美地点召回不完善的信息广泛游戏中使用了不完善的信息时,每个玩家的 emph{trgle regreg regret} 以美元重现后接近最佳的美元(grog T) 美元增长。这比以前最著名的触发-regret contract $O(T ⁇ 1/4}) $(T ⁇ 1/4}) 美元(2022美元) 的触发-regret regregret regret regret regret regremed confremedition (mph{ext-fregreme-formation equileblibrilation) 和\emph{cregregregregregremeal et con concregrecon concent receptrial}的组合组合组合组合化后, 我们的构造在先前的定位中产生了一个较一般的结果。