In this paper, we investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO), in which the learner needs to minimize not only the hitting cost but also the switching cost. In the setting of learning with expert advice, Daniely and Mansour (2019) demonstrate that Discounted-Normal-Predictor can be utilized to yield nearly optimal regret bounds over any interval, even in the presence of switching costs. Inspired by their results, we develop a simple algorithm for SOCO: Combining online gradient descent (OGD) with different step sizes sequentially by Discounted-Normal-Predictor. Despite its simplicity, we prove that it is able to minimize the adaptive regret with switching cost, i.e., attaining nearly optimal regret with switching cost on every interval. By exploiting the theoretical guarantee of OGD for dynamic regret, we further show that the proposed algorithm can minimize the dynamic regret with switching cost in every interval.
翻译:在本文中,我们调查了一种在线预测战略,名为“折扣-热源源(Kaprovov和Panigrahy,2010年)”,用于平滑在线螺旋优化(SOCO),学习者不仅需要最大限度地降低打击成本,还需要降低转换成本。在利用专家建议进行学习的过程中,Daniely和Mansour(2019年)证明,“折扣-热源源代码(Discount-Nal-Predictor)”可以在任何间隔上产生近乎最佳的遗憾界限,即使是在转换成本的情况下也是如此。在结果的启发下,我们为SOCO开发了一个简单的算法:将不同步数的在线梯层(OGD)相接轨组合,由被计-热源源代码(OGD-OGD)组成。尽管它很简洁,但我们证明它能够以转换成本来尽量减少适应性的遗憾,也就是说,在每一间隔期间,通过利用OGD的理论保证,我们进一步表明,拟议的算法可以将变化成本降低动态的遗憾最小化。