In this paper, we examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. For concreteness, we focus on the archetypal follow the regularized leader (FTRL) family of algorithms, and we consider the full spectrum of uncertainty that the players may encounter - from noisy, oracle-based feedback, to bandit, payoff-based information. In this general context, we establish a comprehensive equivalence between the stability of a Nash equilibrium and its support: a Nash equilibrium is stable and attracting with arbitrarily high probability if and only if it is strict (i.e., each equilibrium strategy has a unique best response). This equivalence extends existing continuous-time versions of the folk theorem of evolutionary game theory to a bona fide algorithmic learning setting, and it provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games
翻译:在本文中,我们研究了普通N球员游戏中无雷学习的纳什平衡趋同特性。 具体地说,我们关注的焦点是,大公跟随正规化领导者(FTRL)的算法体系,我们考虑了玩家可能遇到的全方方面面的不确定性,从吵闹的、基于甲骨的反馈,到强盗的、以报酬为基础的信息。 在一般情况下,我们在纳什平衡的稳定及其支持之间建立了全面等同:纳什平衡是稳定的,只有在严格(即每个平衡战略都有独特的最佳反应)的情况下,才有任意高的概率吸引(即每个平衡战略都有独特的最佳反应 ) 。 这一等同将进化游戏理论的民间理论的现有持续时间版本延伸到一个善意的算法学习环境,它为预测游戏中无雷特学习的日常行为提供了一个明确的改进标准。