用于大格式相关平衡的简单无 Regret 无 Regret 学习动态 (Simple Uncoupled No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium)

from arxiv, Extended version of our NeurIPS 2020 paper. Compared to the conference version, this preprint gives finer, in-high-probability regret bounds. We also better connected our work to the phi-regret minimization framework

The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than its counterpart in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device must keep into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play is proven to be a O(T^-0.5)-approximate EFCE with high probability after T game repetitions, and an EFCE almost surely in the limit.

翻译：在普通游戏中,简单且不相容的不正反学习动态的存在,与正式游戏中的正态平衡相交,是多试剂系统理论中一个值得庆贺的结果。具体地说,20多年来人们都知道,当所有玩家试图在重复的正式游戏中尽量减少内部遗憾时,游戏的经验频度会与正式相交的正式平衡相交。广型游戏将正式游戏普遍化游戏,以顺序和同时的动作为模型,以及不完善的信息。由于游戏中私人信息的顺序性质和存在,广式游戏的关联性与正态游戏中的对等机有显著的不同属性,其中许多是正式游戏中的对等,它们仍然是开式游戏中的对等。广式关联性关联性平衡(我们反复式游戏中的对等功能必须顾及每个玩家在游戏中进行观察时的所有信念的演进。由于这一显著的复杂性,极性关联性关联性关联性关系在游戏中的极性变异性变变变变中,我们每次变的变变的变变的变变的变的奥动力都会成为一个问题。