Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop the first controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
翻译:投影操作是在线学习中典型的计算瓶颈。本文在在线凸优化与内存 (OCO-M) 框架内实现无投影的在线学习——OCO-M 可以通过允许在线学习损失函数依赖于当前和过去的决策,来刻画决策历史如何影响当前结果。特别地,我们引入了第一个具有内存的无投影元基学习算法,该算法最小化动态遗憾,即最小化相对于任何时变决策序列的次优性。我们的动机来自于人工智能应用,其中自治体需要实时适应时变环境,考虑过去的决策如何影响当前结果。此类应用包括:动态系统在线控制;统计套利;和时间序列预测。该算法基于在线 Frank-Wolfe (OFW) 和 Hedge算法。我们展示了如何将我们的算法应用于在线控制具有不可预测过程噪声的线性时变系统。为此,我们开发了第一个具有内存和有限动态遗憾算法的控制器,该控制器可以针对任何最优时变线性反馈控制策略进行遗憾控制。我们在在线控制线性时不变系统的模拟场景中验证了我们的算法。