Recent progress in online control has popularized online learning with memory, a variant of the standard online learning problem with loss functions dependent on the prediction history. In this paper, we propose the first strongly adaptive algorithm for this problem: on any interval $\mathcal{I}\subset[1:T]$, the proposed algorithm achieves $\tilde O\left(\sqrt{|\mathcal{I}|}\right)$ policy regret against the best fixed comparator for that interval. Combined with online control techniques, our algorithm results in a strongly adaptive regret bound for the control of linear time-varying systems.
翻译:在线控制的最新进展已经普及了在线记忆学习, 这是一种标准在线学习问题, 损失功能取决于预测历史。 在本文中, 我们提出了第一个强烈适应性算法 : 在任何间隔 $\ mathcal{ I ⁇ subset[ 1: T]$, 提议的算法实现了$\ tilde O\left(\ sqrt\ mathcal{ I ⁇ right)$ 政策对这个间隔的最佳固定参照方的遗憾。 与在线控制技术相结合, 我们的算法产生了强烈适应性遗憾, 要控制线性时间分配系统 。