We develop a unified stochastic approximation framework for analyzing the long-run behavior of multi-agent online learning in games. Our framework is based on a "primal-dual", mirrored Robbins-Monro (MRM) template which encompasses a wide array of popular game-theoretic learning algorithms (gradient methods, their optimistic variants, the EXP3 algorithm for learning with payoff-based feedback in finite games, etc.). In addition to providing an integrated view of these algorithms, the proposed MRM blueprint allows us to obtain a broad range of new convergence results, both asymptotic and in finite time, in both continuous and finite games.
翻译:我们开发了一个统一的随机近似框架,用于分析多试剂网上游戏学习的长期行为。我们的框架基于一个“原始双向”的镜像Robbins-Monro(MRM)模板,它包含广泛的流行游戏理论学习算法(渐进方法、他们的乐观变体、在有限游戏中以基于回报的反馈进行学习的EXP3算法等 ) 。 除了提供对这些算法的综合观点外,拟议的MRM蓝图还使我们能够在连续游戏和有限游戏中获得广泛的新趋同结果,包括零星和有限的时间。