In this paper, we consider multi-agent learning via online gradient descent in a class of games called $\lambda$-cocoercive games, a fairly broad class of games that admits many Nash equilibria and that properly includes unconstrained strongly monotone games. We characterize the finite-time last-iterate convergence rate for joint OGD learning on $\lambda$-cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of problem parameter (e.g. cocoercive constant $\lambda$) and show, via a novel double-stopping time technique, that this adaptive algorithm achieves same finite-time last-iterate convergence rate as non-adaptive counterpart. Subsequently, we extend OGD learning to the noisy gradient feedback case and establish last-iterate convergence results -- first qualitative almost sure convergence, then quantitative finite-time convergence rates -- all under non-decreasing step-sizes. To our knowledge, we provide the first set of results that fill in several gaps of the existing multi-agent online learning literature, where three aspects -- finite-time convergence rates, non-decreasing step-sizes, and fully adaptive algorithms have been unexplored before.
翻译:在本文中,我们考虑在名为 $\lambda$-coercive games 的游戏类中,通过在线梯度下降进行多试剂学习,这是一个相当广泛的游戏类别,接受许多纳什均衡,并适当包括不受限制的强单调游戏。我们在本文中将OGD在$\lambda$-coercive游戏上联合学习的有限时间最后时间点趋同率定性为非适应性OGD学习算法;在此基础上,我们开发了完全适应性OGD学习算法,该算法不需要对问题参数(例如COocercive stand $\lambda$)有任何了解,并且通过新颖的双击时间技术显示,这种适应性算法实现了与非适应性的最后时间趋同率相同的非适应性组合率。随后,我们将OGDM学习的有限时间最后时间点速率推广到电流反馈案例,并确立最后的趋同结果 -- 首先是质量几乎肯定的趋同,然后是定量的有限时间趋同率 -- 给我们的知识,我们提供了第一个一组的结果,填补了现有多试级的升级的升级的升级的升级和完全学习文学的分级的分级的分级的分数。