适应性和非平稳性：特定问题动态遗憾的在线凸优化 (Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization)

from arxiv, V2 changes: Updated to improve the presentation (emphasizing the importance of gradient-variation bounds and the contribution of Sword++ algorithm)

We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible comparator sequence. Let $T$ be the time horizon and $P_T$ be the path length that essentially reflects the non-stationarity of environments, the state-of-the-art dynamic regret is $\mathcal{O}(\sqrt{T(1+P_T)})$. Although this bound is proved to be minimax optimal for convex functions, in this paper, we demonstrate that it is possible to further enhance the guarantee for some easy problem instances, particularly when online functions are smooth. Specifically, we introduce novel online algorithms that can exploit smoothness and replace the dependence on $T$ in dynamic regret with problem-dependent quantities: the variation in gradients of loss functions, the cumulative loss of the comparator sequence, and the minimum of these two terms. These quantities are at most $\mathcal{O}(T)$ while could be much smaller in benign environments. Therefore, our results are adaptive to the intrinsic difficulty of the problem, since the bounds are tighter than existing results for easy problems and meanwhile guarantee the same rate in the worst case. Notably, our proposed algorithms can achieve favorable dynamic regret with only one gradient per iteration, sharing the same gradient query complexity as the static regret minimization methods. To accomplish this, we introduce the framework of collaborative online ensemble. The proposed framework employs a two-layer online ensemble to handle non-stationarity, and uses optimistic online learning and further introduces crucial correction terms to facilitate effective collaboration within the meta-base two layers, thereby attaining adaptivity. We believe that the framework can be useful for broader problems.

翻译：我们研究了非平稳环境下的在线凸优化，并选择动态遗憾作为性能度量，定义为在线算法和任何可行比较器序列之间的累积损失差异。设 $T$ 为时间范围，$P_T$ 反映环境的非平稳性质量，现有技术的动态遗憾是 $\mathcal{O}(\sqrt{T(1+P_T)})$。虽然这个界限在凸函数方面被证明是极小化的，但本文中，我们证明了对于某些简单的问题实例，特别是当在线函数是光滑的时候，可以进一步提升保证。具体而言，我们引入了能够利用平滑性并将问题动态遗憾中的 $T$ 依赖替换成问题相关量的新型在线算法：损失函数梯度变化、比较器序列累计损失以及这两个参数的最小值。这些参数最多为 $\mathcal{O}(T)$，但在良性环境下可能会小得多。因此，我们的结果适应于问题的内在难度，因为界限比现有结果在简单问题上更紧，同时保证了最坏情况下的相同速率。值得注意的是，我们提出的算法只需要每次迭代一个梯度，就能实现有利的动态遗憾，与静态遗憾最小化方法共享相同的梯度查询复杂度。为此，我们引入了协作在线集合的框架。该提议框架采用了两层在线集合来处理非平稳性，并使用乐观在线学习并进一步引入关键修正项，以促进元基二层内的有效协作，从而实现适应性。我们相信这个框架可以用于更广泛的问题。