Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.
翻译:在线结构化预测(包括在线分类作为特例)是指根据输入特征顺序预测标签的任务。其中替代遗憾——即固定最优估计器在目标损失(如0-1损失)与替代损失(如逻辑损失)上的累积差值——已受到关注,特别是因为它通常允许存在与时间范围$T$无关的有限界。然而,此类保证在非平稳环境中失效,因为每个固定估计器可能产生随$T$线性增长的替代损失。我们通过证明目标损失累积值具有$F_T + C(1 + P_T)$形式的上界来解决此问题,其中$F_T$是任意比较序列的累积替代损失,$P_T$是其路径长度,$C > 0$为常数。该上界仅通过$F_T$和$P_T$依赖于$T$,在非平稳环境中通常能提供更强的保证。我们的核心思想是将在线梯度下降(OGD)的动态遗憾界与利用替代间隙的技术相结合。我们的分析还揭示了一种新的Polyak风格OGD学习率,该方法系统性地提供目标损失保证并展现出良好的实证性能。我们进一步通过卷积Fenchel–Young损失将方法扩展到更广泛的问题类别。最后,我们证明了下界,表明对$F_T$和$P_T$的依赖关系是紧致的。