In this paper, we revisit the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost, and target two performance metrics: competitive ratio and dynamic regret with switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost. Our theoretical analysis shows that the greedy algorithm, although straightforward, is $1+O(\frac{1}{\lambda})$-competitive for $\lambda$-quadratic growth functions, $1 + \frac{2}{\sqrt{\lambda}}$-competitive for convex and $\lambda$-quadratic growth functions, and $1+ \frac{2}{\alpha}$-competitive for $\alpha$-polyhedral functions. To bound the dynamic regret with switching cost, we follow the standard setting of online convex optimization, in which the hitting cost is convex but hidden from the learner before making predictions. We modify Ader, an existing algorithm designed for dynamic regret, slightly to take into account the switching cost when measuring the performance. The proposed algorithm, named as Smoothed Ader, attains an optimal $O(\sqrt{T(1+P_T)})$ bound for dynamic regret with switching cost, where $P_T$ is the path-length of the comparator sequence. Furthermore, if the hitting cost is accessible in the beginning of each round, we obtain a similar guarantee without the bounded gradient condition.
翻译:在本文中,我们重新审视了平滑的在线学习问题,即在线学习者既要付出成本,也要付出转换成本,并针对两个性能衡量标准:竞争比率和对转换成本的动态遗憾。为了约束竞争比率,我们假定每个回合的学习者都知道打击成本,调查贪婪的算法,它只是将打击成本和切换成本的加权和加权总和降到最低。我们的理论分析表明,贪婪的算法虽然直截了当,但却是1+O(\frac{1unbdalbda})美元对美元(lambda$-夸德增长功能的竞争力)的冲击成本,并且针对美元(lambdda$-craft 增长功能,1+fraplealalalalal compal)的竞争力。我们遵循的是可理解性能调和性平滑度成本的标准设置,我们打折的成本是Convex,但隐藏于T学习者对Creadx $(forx) 之前对 contradeal_ralalalalalal a dalalal deal devalation coal) coal deal deal deal deal deal deal deal deal deal.