从部分信息学习到强盗:只有严格的纳什平衡是稳定的 (From Learning with Partial Information to Bandits: Only Strict Nash Equilibria are Stable)

In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general $N$-player games. Despite the importance and widespread applications of no-regret algorithms, their long-run behavior in multi-agent environments is still far from understood, and most of the literature has focused by necessity on certain, specific classes of games (typically zero-sum or congestion games). Instead of focusing on a fixed class of games, we instead take a structural approach and examine different classes of equilibria in generic games. For concreteness, we focus on the archetypal "follow the regularized leader" (FTRL) class of algorithms, and we consider the full spectrum of information uncertainty that the players may encounter - from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This result extends existing continuous-time versions of the "folk theorem" of evolutionary game theory to a bona fide discrete-time learning setting, and provides an important link between the literature on multi-armed bandits and the equilibrium refinement literature.

翻译：在本文中,我们研究的是普通游戏中一般不gret学习的纳什平衡趋同特性。尽管不gret算法的重要性和广泛应用,但是它们在许多试剂环境中的长期行为仍然远没有被理解,而且大部分文献的焦点是某些特定的游戏类别(通常是零和或堵塞游戏)的必要性。我们没有关注固定的游戏类别,而是采取结构性方法,在普通游戏中考察不同类别的平衡。具体地说,我们侧重于“遵循正规化领导者”(FTRL)类算法,我们考虑的是玩家可能遇到的全部信息不确定性――从吵闹、基于触觉的反馈到土匪式、基于报酬的信息。在这种总体背景下,我们建立了对纳什平衡稳定性及其支持的全面等值:如果而且只有在严格(即每个平衡战略都有独特的最佳反应)情况下,纳什平衡是稳定的,并且吸引了任意高的概率。这个结果将现有的“正规化领导者”(FTRL)类算法的“固定化”,我们考虑了玩家们可能遇到的全部信息不确定性――从噪音、基于手法的反馈、土质的反馈,到实质的、稳定的、稳定的、稳定的游戏的不断的精细化的理论,提供了一个稳定的、正统化的“正统化的、正统的、正统的、正统的理论的理论的生物学上,提供了的理论的理论的理论的精细。