We consider the problem of online control of systems with time-varying linear dynamics. This is a general formulation that is motivated by the use of local linearization in control of nonlinear dynamical systems. To state meaningful guarantees over changing environments, we introduce the metric of {\it adaptive regret} to the field of control. This metric, originally studied in online learning, measures performance in terms of regret against the best policy in hindsight on {\it any interval in time}, and thus captures the adaptation of the controller to changing dynamics. Our main contribution is a novel efficient meta-algorithm: it converts a controller with sublinear regret bounds into one with sublinear {\it adaptive regret} bounds in the setting of time-varying linear dynamical systems. The main technical innovation is the first adaptive regret bound for the more general framework of online convex optimization with memory. Furthermore, we give a lower bound showing that our attained adaptive regret bound is nearly tight for this general framework.
翻译:我们考虑的是具有时间差异线性动态的系统在线控制问题。 这是一个由使用本地线性来控制非线性动态系统的通用公式。 为了说明对变化环境的有意义的保障, 我们向控制领域引入了“ ~ ~ 适应性遗憾 ” 的度量。 最初在在线学习中研究的这一度量, 对照后视中的最佳政策, 从遗憾的角度衡量业绩, 从而捕捉控制器适应变化动态的适应性。 我们的主要贡献是新颖的、 高效的元数据: 它将一个带有子线性遗憾界限的控制器转换成一个在时间变化线性线性动态系统设置中带有子线性遗憾 。 主要的技术创新是第一个适应性遗憾, 以记忆为更宽泛的在线 convex 优化框架。 此外, 我们给出了一个较低的约束, 显示我们达到的适应性遗憾约束对于这个总框架来说几乎是紧凑的 。