In this study, we consider a variant of the Follow the Regularized Leader (FTRL) dynamics in two-player zero-sum games. FTRL is guaranteed to converge to a Nash equilibrium when time-averaging the strategies, while a lot of variants suffer from the issue of limit cycling behavior, i.e., lack the last-iterate convergence guarantee. To this end, we propose mutant FTRL (M-FTRL), an algorithm that introduces mutation for the perturbation of action probabilities. We then investigate the continuous-time dynamics of M-FTRL and provide the strong convergence guarantees toward stationary points that approximate Nash equilibria under full-information feedback. Furthermore, our simulation demonstrates that M-FTRL can enjoy faster convergence rates than FTRL and optimistic FTRL under full-information feedback and surprisingly exhibits clear convergence under bandit feedback.
翻译:在此研究中,我们考虑“跟踪正规化领导人(FTRL)”在双玩者零和游戏中的变体。FTRL保证在战略时间稳定时会趋同于纳什平衡,而许多变体会遇到限制自行车行为的问题,即缺乏最后的合并保证。为此,我们提出变种FTRL(M-FTRL)算法,该算法引入了行动概率干扰的突变。然后我们调查M-FTRL的连续时间动态,并提供强有力的趋同保证,以至固定点,即根据完整信息反馈,接近Nashequilibria的固定点。此外,我们的模拟表明,M-FTRL可以比FTRL和乐观的FTRL在完整信息反馈下更快的趋同率,令人惊讶地展示了在频谱反馈下的明显趋同。