A seminal result in game theory is von Neumann's minmax theorem, which states that zero-sum games admit an essentially unique equilibrium solution. Classical learning results build on this theorem to show that online no-regret dynamics converge to an equilibrium in a time-average sense in zero-sum games. In the past several years, a key research direction has focused on characterizing the day-to-day behavior of such dynamics. General results in this direction show that broad classes of online learning dynamics are cyclic, and formally Poincar\'{e} recurrent, in zero-sum games. We analyze the robustness of these online learning behaviors in the case of periodic zero-sum games with a time-invariant equilibrium. This model generalizes the usual repeated game formulation while also being a realistic and natural model of a repeated competition between players that depends on exogenous environmental variations such as time-of-day effects, week-to-week trends, and seasonality. Interestingly, time-average convergence may fail even in the simplest such settings, in spite of the equilibrium being fixed. In contrast, using novel analysis methods, we show that Poincar\'{e} recurrence provably generalizes despite the complex, non-autonomous nature of these dynamical systems.
翻译:游戏理论的开创性结果就是冯纽曼的minmax论理, 该理论指出零和游戏接受一个基本独特的均衡解决方案。 经典学习结果以这个理论为基础, 表明在线零和游戏的动态在零和游戏中平均时间意义上趋于平衡。 在过去几年中, 一个关键的研究方向侧重于描述这种动态的日常行为。 这个方向的一般结果显示, 广泛的在线学习动态周期性是循环性的, 并且正式的Pincar\'{e} 重复性, 在零和游戏中。 我们分析这些在线学习行为在周期性零和游戏中的稳健性, 且具有时差平衡性。 这个模型概括了通常的重复游戏配方, 同时也是游戏参与者之间反复竞争的现实和自然模式, 取决于外在环境变化, 如时间效应、 每周趋势以及季节性。 有趣的是, 时间- 平均合并甚至可能在这种最简单的环境下失败, 尽管平衡是固定的。 相反, 尽管这些动态性分析系统是复杂的, 我们使用新的动态性分析系统, 却显示, 反复性地显示, 尽管这些动态性反复性分析。