In online convex optimization the player aims to minimize her regret against a fixed comparator over the entire repeated game. Algorithms that minimize standard regret may converge to a fixed decision, which is undesireable in changing or dynamic environments. This motivates the stronger metric of adaptive regret, or the maximum regret over any continuous sub-interval in time. Existing adaptive regret algorithms suffer from a computational penalty - typically on the order of a multiplicative factor that grows logarithmically in the number of game iterations. In this paper we show how to reduce this computational penalty to be doubly logarithmic in the number of game iterations, and with minimal degradation to the optimal attainable adaptive regret bounds.
翻译:在在线 convex 优化中,玩家旨在最小化她对固定参照者在整个重复游戏中的遗憾。 将标准遗憾最小化的算法可能汇合到固定的决定上, 而在变化或动态环境中是无法做到的。 这激励着更强烈的适应性遗憾度, 或对任何连续的次互动时间的最大遗憾度。 现有的适应性遗憾算法受到计算性惩罚 — — 通常根据一种倍增效应的顺序排列, 使游戏的迭代数成对数。 在本文中,我们展示了如何将这一计算处罚降低到游戏迭代数的双重对数, 并且将最小的降解到最佳的可实现的适应性遗憾界限。