运动会中的进化动态和$$-地区最小化 (Evolutionary Dynamics and $Φ$-Regret Minimization in Games)

Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner's performance against a baseline in hindsight. It is well-known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full \emph{mixed} strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established $\Phi$-regret framework, which provides a continuum of stronger regret measures. Importantly, $\Phi$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of $\Phi$-regret in generic $2 \times 2$ games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 $2 \times 2$ games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of $\Phi$-regret minimization by RD in some larger games, hinting at further opportunity for $\Phi$-regret based study of such algorithms from both a theoretical and empirical perspective.

翻译：遗憾已被确定为在线学习的基础概念, 并且同样在游戏中学习动态分析中也有重要的应用。遗憾地量化了学习者的表现与后视基准值之间的差异。众所周知, 遗憾最小化算法会聚集到游戏中的某些类平衡; 但是, 游戏理论中的传统遗憾形式主要考虑到允许偏离确定性行动或战略的基线。在本文中, 我们重新审视了我们对遗憾的理解, 其角度是, 在于从整个 emph{ mixed} 战略空间的分区偏差( 即, 纯战略的概率分布 ) 上。在先前建立的 $\ Phi$- regret 框架的镜像之下, 遗憾最小化的算法会聚集到游戏中; $( Phi$)- regretretret; 学习者可以考虑与混合战略的偏差; 概括一些现有的遗憾概念, 例如外部、内部和交换遗憾, 从而扩大从基于遗憾的数学的算法分析中获得的数值分析中获得的洞察力 $ 。我们在这里证明, 最接近地从最精确的进进进进进的算法, 。。在后来, 以最深的算法的的以的以最深的算法的以以 $ 方向上, 最强的算法。