广形式Cor相关和CarseCor相关平衡的快速无区域学习动态 (Faster No-Regret Learning Dynamics for Extensive-Form Correlated and Coarse Correlated Equilibria)

from arxiv, Preliminary parts of this paper will appear at the AAAI-22 Workshop on Reinforcement Learning in Games. This version also contains results from an earlier preprint published by a subset of the authors (arXiv:2109.08138)

A recent emerging trend in the literature on learning in games has been concerned with providing faster learning dynamics for correlated and coarse correlated equilibria in normal-form games. Much less is known about the significantly more challenging setting of extensive-form games, which can capture both sequential and simultaneous moves, as well as imperfect information. In this paper we establish faster no-regret learning dynamics for \textit{extensive-form correlated equilibria (EFCE)} in multiplayer general-sum imperfect-information extensive-form games. When all players follow our accelerated dynamics, the correlated distribution of play is an $O(T^{-3/4})$-approximate EFCE, where the $O(\cdot)$ notation suppresses parameters polynomial in the description of the game. This significantly improves over the best prior rate of $O(T^{-1/2})$. To achieve this, we develop a framework for performing accelerated \emph{Phi-regret minimization} via predictions. One of our key technical contributions -- that enables us to employ our generic template -- is to characterize the stability of fixed points associated with \emph{trigger deviation functions} through a refined perturbation analysis of a structured Markov chain. Furthermore, for the simpler solution concept of extensive-form \emph{coarse} correlated equilibrium (EFCCE) we give a new succinct closed-form characterization of the associated fixed points, bypassing the expensive computation of stationary distributions required for EFCE. Our results place EFCCE closer to \emph{normal-form coarse correlated equilibria} in terms of the per-iteration complexity, although the former prescribes a much more compelling notion of correlation. Finally, experiments conducted on standard benchmarks corroborate our theoretical findings.

翻译：有关游戏学习的文献中最近出现的趋势一直关注为正常形式游戏中相关和粗正相关平衡提供更快的学习动态。远不那么清楚的是, 广型游戏的设置更具挑战性, 它可以捕捉相继和同时的动作, 以及不完善的信息。在本文中, 我们为\ textit{ extensive- form condition equilibrial (EFCE) 在多玩家一般和不完善信息广泛组合游戏中建立更快的不回报学习动态。当所有玩家都遵循我们加速的动态时, 游戏的相对分布是 $( T ⁇ -3/4} ) $( $- a preform) 接近 EFCEFCE 。 $( cento) 的批量表示在游戏描述中, 超前 $O (T ⁇ -1/2} Exferfreme- formal commation resulation resulate roup roup roup roup roup roup roup roup roup roup roup roup roupl roupl roup rolation rol rol rol rol rol rol rol rol) 。我们要, 我们开发了一个更 roup rout rol rol rol rol 。 routxxxxxxxxxxxxxxx 。