Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop the first controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
翻译:投影操作是在线学习的一个典型的计算瓶颈。 在本文中, 我们允许在在线存储优化存储( OCO- M) 框架内进行不投射的在线学习 -- -- OCO- M 通过允许在线学习损失功能取决于当前和过去的决定, 从而让在线学习损失功能取决于当前和过去的决定, 从而了解决定的历史如何影响当前的结果。 特别是, 我们引入了第一个不投影的元基学习算法, 将动态后悔降到最低程度, 也就是说, 在任何时间变化的决定序列中, 将不优化的程度降到最低程度。 我们受到人工智能应用的激励, 自主代理需要实时适应时间变化的环境, 并解释过去的决定如何影响当前。 应用的例子有: 动态系统的在线控制; 统计套用; 时间序列预测。 算法建立在在线 Frank- Wolfe (OFW) 和 Hedge 算法上。 我们展示了我们的算法如何在不可预测的过程噪音面前对线性时间变化系统进行在线控制。 为此, 我们开发了第一个具有记忆和约束式的线性弹性政策列表。