重新审视平滑在线学习 (Revisiting Smoothed Online Learning)

In this paper, we revisit the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost, and target two performance metrics: competitive ratio and dynamic regret with switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost. Our theoretical analysis shows that the greedy algorithm, although straightforward, is $1+ \frac{2}{\alpha}$-competitive for $\alpha$-polyhedral functions, $1+O(\frac{1}{\lambda})$-competitive for $\lambda$-quadratic growth functions, and $1 + \frac{2}{\sqrt{\lambda}}$-competitive for convex and $\lambda$-quadratic growth functions. To bound the dynamic regret with switching cost, we follow the standard setting of online convex optimization, in which the hitting cost is convex but hidden from the learner before making predictions. We modify Ader, an existing algorithm designed for dynamic regret, slightly to take into account the switching cost when measuring the performance. The proposed algorithm, named as Smoothed Ader, attains an optimal $O(\sqrt{T(1+P_T)})$ bound for dynamic regret with switching cost, where $P_T$ is the path-length of the comparator sequence. Furthermore, if the hitting cost is accessible in the beginning of each round, we obtain a similar guarantee without the bounded gradient condition.

翻译：在本文中,我们重新审视了平滑的在线学习问题,即在线学习者既要付出成本,又要付出转换成本,并针对两个性能衡量标准:竞争比率和对转换成本的动态遗憾。为了约束竞争比率,我们假定每个回合的学习者都知道打击成本,并调查贪婪的算法,该算法只是将打击成本和转换成本的加权总和降到最低。我们的理论分析表明,贪婪的算法虽然直截了当,但却是1美元+\\frac{2halpha}{P$-allyheral 函数的竞争力,1美元+alpha$(frac{1unlumbda})-producal deal compressments:1+lambda$(flax-qda$)的竞争力。我们遵循了可理解性能调和可理解性能的硬度调整标准设置,我们所付出的代价是从学习者身上隐藏的,但在做出预测之前,我们所设计的平滑动的平滑度的算算算法中,我们所设计的平整的平整的平整的运行成本是平整的。