We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret.
翻译:我们设计了学习率的调整策略,以最小化随着数据分布的变化产生的随机梯度下降(SGD)在线学习的后悔。我们通过随机微分方程的新颖分析,完全刻画了在线线性回归的最优学习率调整策略。对于一般的凸损失函数,我们提出了新的学习率调整方法,对分布变化具有鲁棒性,并给出了后悔的上下界,两者只相差常数。对于非凸损失函数,我们定义了一个后悔的概念,基于估计模型的梯度范数,并提出了一种学习策略,以最小化总预期后悔的上界。直观地说,我们期望在损失函数变化的情况下需要更多的探索,我们证明了最优学习率调整策略通常随着分布偏移增加而增加。最后,我们提供了高维回归模型和神经网络的实验,以说明这些学习率调整策略及其累计后悔。