Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop a controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
翻译:投影运算是在线学习中的典型计算瓶颈。在本文中,我们在带内存的在线凸优化(Online Convex Optimization with Memory,OCO-M)框架下实现了无投影的在线学习。OCO-M通过允许在线学习损失函数依赖于当前和过去的决策,从而捕获了历史决策如何影响当前结果。特别地,我们提出了第一个无投影元基学习算法,该算法在最小化动态遗憾时,即最小化相对于任何时间变化的决策序列的次优性。我们的研究动机是人工智能应用,其中自主代理需要在实时变化的环境中进行适应,考虑到过去的决策如何影响现在。这些应用的示例包括:动态系统的在线控制;统计套利;以及时间序列预测。该算法基于Online Frank-Wolfe(OFW)和Hedge算法。我们展示了如何将我们的算法应用于在线控制具有不可预测过程噪声的线性时变系统。为此,我们开发了具有内存和动态遗憾有界的控制器来对抗任何最优时间变化的线性反馈控制策略。我们在在线控制线性时不变系统的模拟场景中验证了我们的算法。