A sequence of works in unconstrained online convex optimisation have investigated the possibility of adapting simultaneously to the norm $U$ of the comparator and the maximum norm $G$ of the gradients. In full generality, matching upper and lower bounds are known which show that this comes at the unavoidable cost of an additive $G U^3$, which is not needed when either $G$ or $U$ is known in advance. Surprisingly, recent results by Kempka et al. (2019) show that no such price for adaptivity is needed in the specific case of $1$-Lipschitz losses like the hinge loss. We follow up on this observation by showing that there is in fact never a price to pay for adaptivity if we specialise to any of the other common supervised online learning losses: our results cover log loss, (linear and non-parametric) logistic regression, square loss prediction, and (linear and non-parametric) least-squares regression. We also fill in several gaps in the literature by providing matching lower bounds with an explicit dependence on $U$. In all cases we obtain scale-free algorithms, which are suitably invariant under rescaling of the data. Our general goal is to establish achievable rates without concern for computational efficiency, but for linear logistic regression we also provide an adaptive method that is as efficient as the recent non-adaptive algorithm by Agarwal et al. (2021).
翻译:令人惊讶的是,Kempka等人(2019年)最近的结果表明,在参照国的美元和梯度的最大标准美元同时适应标准值最高值值为美元的情况下,不需要为适应性而同时支付这种价格。我们跟踪了这一观察,我们发现,如果我们专门处理其他共同监督的在线学习损失,事实上绝不会为适应性付出任何代价:我们的结果包括逻辑损失、(线性和非线性)物流倒退、平方损失预测和(线性和非线性)最低回归。我们还填补了文献中的若干空白,将较低的界限与对U美元的明确依赖相匹配。 在所有这些情况下,我们通过不考虑任何共同监督的在线学习损失,我们不会为适应性付出任何代价。