We propose a computationally-friendly adaptive learning rate schedule, "AdaLoss", which directly uses the information of the loss function to adjust the stepsize in gradient descent methods. We prove that this schedule enjoys linear convergence in linear regression. Moreover, we provide a linear convergence guarantee over the non-convex regime, in the context of two-layer over-parameterized neural networks. If the width of the first-hidden layer in the two-layer networks is sufficiently large (polynomially), then AdaLoss converges robustly \emph{to the global minimum} in polynomial time. We numerically verify the theoretical results and extend the scope of the numerical experiments by considering applications in LSTM models for text clarification and policy gradients for control problems.
翻译:我们建议了一个对计算友好的适应性学习进度表,即“AdaLos”,它直接使用损失函数信息来调整梯度下降方法的阶梯化。我们证明这一进度表在线性回归中具有线性趋同性。此外,我们还在两层超分度神经网络的背景下,为非阴道系统提供了线性趋同保证。如果两层网络中第一个隐藏层的宽度足够大(在多时制),那么AdaLoss在多时制下将强力集合到全球最小值。我们通过在LSTM模型中考虑用于文本澄清和控制问题的政策梯度,对理论结果进行定量核查,并扩大数字实验的范围。