通过控制理论在运动会中实现在线优化:连接悔恨、被动和Poincaré重现 (Online Optimization in Games via Control Theory: Connecting Regret, Passivity and Poincaré Recurrence)

We present a novel control-theoretic understanding of online optimization and learning in games, via the notion of passivity. Passivity is a fundamental concept in control theory, which abstracts energy conservation and dissipation in physical systems. It has become a standard tool in analysis of general feedback systems, to which game dynamics belong. Our starting point is to show that all continuous-time Follow-the-Regularized-Leader (FTRL) dynamics, which include the well-known Replicator Dynamic, are lossless, i.e. it is passive with no energy dissipation. Interestingly, we prove that passivity implies bounded regret, connecting two fundamental primitives of control theory and online optimization. The observation of energy conservation in FTRL inspires us to present a family of lossless learning dynamics, each of which has an underlying energy function with a simple gradient structure. This family is closed under convex combination; as an immediate corollary, any convex combination of FTRL dynamics is lossless and thus has bounded regret. This allows us to extend the framework of Fox and Shamma [Games, 2013] to prove not just global asymptotic stability results for game dynamics, but Poincar\'e recurrence results as well. Intuitively, when a lossless game (e.g. graphical constant-sum game) is coupled with lossless learning dynamics, their feedback interconnection is also lossless, which results in a pendulum-like energy-preserving recurrent behavior, generalizing the results of Piliouras and Shamma [SODA, 2014] and Mertikopoulos, Papadimitriou and Piliouras [SODA, 2018].

翻译：我们通过被动概念展示了对在线优化和游戏中学习的新型控制理论理解。被动是控制理论中的一个基本概念,它总结了物理系统中的节能和消散。它已成为分析一般反馈系统的标准工具,游戏动力属于这些系统。我们的出发点是显示所有连续时间的“追踪-再分类-引导”动态,其中包括众所周知的“复制者”动态,是无损的,也就是说,它是被动的,没有能量消散。有趣的是,我们证明,被动意味着受约束的遗憾,连接了两个基本的控制理论和网络优化原始源。FTRL对能源节能的观察激励我们展示了一个无损学习动态的组合,每个系统都有一个简单的梯度结构的基本能量功能。这个组合在 convex组合下被封闭;作为直接的必然结果,FTRL动态的任何螺旋组合都是无损的,因此也令人感到遗憾。这使我们能够扩展Fox和Shamma(Gamesloral-commillyalalal)框架,但将常规的游戏-realliversal-restial Restal-hildal-restial-restial-ress)结果(2013年Game-hex-hex-hex-hill), 证明,其持续损失结果并非全球。

相关内容

SODA

关注 0

本专题讨论会主要讨论离散问题之有效演算法与资料结构。除了这些方法和结构的设计，还包括它们的使用、性能分析以及与它们的发展或局限性相关的数学问题。性能分析可以是分析性的，也可以是实验性的，可以是针对最坏情况或预期情况的性能。研究可以是理论性的，也可以是基于实践中出现的数据集，可以解决绩效分析中涉及的方法学问题。官网链接：https://www.siam.org/conferences/cm/conference/soda20

【NeurIPS 2019】多关系庞加莱图嵌入，Multi-relational Poincaré Graph Embeddings

专知会员服务

49+阅读 · 2020年6月15日

【论文】多关系庞加莱图嵌入（Multi-relational Poincaré Graph Embeddings），爱丁堡大学| Ivana Balažević

专知会员服务

59+阅读 · 2019年12月30日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日