广泛形式粗粗的与平衡有关的平衡的高效分散化学习动态:不需要对固定分发量进行昂贵的计算 (Efficient Decentralized Learning Dynamics for Extensive-Form Coarse Correlated Equilibrium: No Expensive Computation of Stationary Distributions Required)

2021 年 9 月 16 日

Efficient Decentralized Learning Dynamics for Extensive-Form Coarse Correlated Equilibrium: No Expensive Computation of Stationary Distributions Required

翻译：广泛形式粗粗的与平衡有关的平衡的高效分散化学习动态:不需要对固定分发量进行昂贵的计算

Gabriele Farina,Andrea Celli,Tuomas Sandholm

While in two-player zero-sum games the Nash equilibrium is a well-established prescriptive notion of optimal play, its applicability as a prescriptive tool beyond that setting is limited. Consequently, the study of decentralized learning dynamics that guarantee convergence to correlated solution concepts in multiplayer, general-sum extensive-form (i.e., tree-form) games has become an important topic of active research. The per-iteration complexity of the currently known learning dynamics depends on the specific correlated solution concept considered. For example, in the case of extensive-form correlated equilibrium (EFCE), all known dynamics require, as an intermediate step at each iteration, to compute the stationary distribution of multiple Markov chains, an expensive operation in practice. Oppositely, in the case of normal-form coarse correlated equilibrium (NFCCE), simple no-external-regret learning dynamics that amount to a linear-time traversal of the tree-form decision space of each agent suffice to guarantee convergence. This paper focuses on extensive-form coarse correlated equilibrium (EFCCE), an intermediate solution concept that is a subset of NFCCE and a superset of EFCE. Being a superset of EFCE, any learning dynamics for EFCE automatically guarantees convergence to EFCCE. However, since EFCCE is a simpler solution concept, this begs the question: do learning dynamics for EFCCE that avoid the expensive computation of stationary distributions exist? This paper answers the previous question in the positive. Our learning dynamics only require the orchestration of no-external-regret minimizers, thus showing that EFCCE is more akin to NFCCE than to EFCE from a learning perspective. Our dynamics guarantees that the empirical frequency of play after $T$ iteration is a $O(1/\sqrt{T})$-approximate EFCCE with high probability, and an EFCCE almost surely in the limit.

翻译：虽然在双玩者零和游戏中,纳什平衡是一个成熟的规范性最佳游戏概念,但作为超越该环境的规范性工具,它的适用性是有限的。因此,研究分散学习动态,以保证在多玩者、一般和广泛形式(即树形)游戏中与相关解决方案概念趋同。目前已知的学习动态的渗透复杂性取决于所考虑的具体相关解决方案概念。例如,在广泛形式相关平衡(EFCE)中,所有已知动态都需要,作为每次循环的中间步骤,对多马可夫链的固定动态分布进行计算,这是一项昂贵的实际操作。相反,在普通形式相近的混合组合(NFCE)游戏中,简单的无外部学习动态动态动态(即树形(树形),相当于每个代理人的直线时间性游戏决定空间的演进量足以保证趋同。本文侧重于广泛形式的离差关系平衡(EFCECE),这是NFCE的中间解决方案的一个子,这是NCEFCE的正性动态流流流流流学,这是自EFCEFCE公司的更简单的循环。