关于混合战略与无悔恨学习不相协调问题 (On the Impossibility of Convergence of Mixed Strategies with No Regret Learning)

We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies in a repeated game setting where the stage game is any 2 by 2 competitive game (i.e. game for which all the Nash equilibria (NE) of the game are completely mixed). We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play (and the step number), that we call mean based strategies. We first show that there does not exist any optimal no regret, mean based strategy for player 1 that would result in the convergence of her mixed strategies (in probability) against an opponent that plays his Nash equilibrium mixed strategy at each step. Next, we show that this last iterate divergence necessarily occurs if player 2 uses any adaptive strategy with a minimal randomness property. This property is satisfied, for example, by any fixed sequence of mixed strategies for player 2 that converges to NE. We conjecture that this property holds when both players use optimal no regret learning strategies against each other, leading to the divergence of the mixed strategies with a positive probability. Finally, we show that variants of mean based strategies using recency bias, which have yielded last iterate convergence in deterministic min max optimization, continue to lead to this last iterate divergence. This demonstrates a crucial difference in outcomes between using the opponent's mixtures and realizations to make strategy updates.

翻译：我们研究混合战略的趋同特性,这些混合战略产生于一整类最佳而没有遗憾的学习策略,这些策略是在反复的游戏环境中产生的,在这个游戏中,阶段性游戏是任何2比2的竞技游戏(即游戏中所有Nash equilibria(NE)都完全混在一起的游戏)。我们考虑到每个步骤所设定的信息是对手实际游戏的经验平均数(和步骤号)的混合战略的类别,我们称之为以平均为基础的战略。我们首先指出,对于玩家1来说,不存在任何最佳的毫不遗憾的、以平均为基础的战略,这种战略会导致她的混合战略(概率)与在每一个步骤上玩他的纳什平衡混合战略的对手的趋同。接下来,我们表明,如果玩家2使用任何适应性战略,而最小随机性属性则完全混杂在一起。例如,对玩家2的混合战略的任何固定的顺序都感到满意,我们推测,当玩家使用最佳的不后悔学习策略来对付对方,从而导致混合战略的差别和积极可能性。最后,我们表明,如果玩家采用这种基于深层次的战略,那么,这种战略的变式,就会使用最接近性战略来显示这种最接近性的趋同性,从而显示这种最接近性结果。