When balancing the practical tradeoffs of iterative methods for large-scale optimization, the learning rate schedule remains notoriously difficult to understand and expensive to tune. We demonstrate the presence of these subtleties even in the innocuous case when the objective is a convex quadratic. We reinterpret an iterative algorithm from the numerical analysis literature as what we call the Chebyshev learning rate schedule for accelerating vanilla gradient descent, and show that the problem of mitigating instability leads to a fractal ordering of step sizes. We provide some experiments and discussion to challenge current understandings of the "edge of stability" in deep learning: even in simple settings, provable acceleration can be obtained by making negative local progress on the objective.
翻译:在平衡大规模优化迭代方法的实际权衡时,学习进度表仍然难以理解,而且费用昂贵。我们证明这些微妙之处的存在,即使在目标为锥形二次曲线的无懈可击的情况下也是如此。我们从数字分析文献中重新解读了一种迭代算法,称为Chebyshev学习速度表,以加速香草梯度的下降速度,并表明缓解不稳定问题导致步数的折叠顺序。我们提供了一些实验和讨论,以挑战当前对深层次学习中“稳定前沿”的理解:即使在简单的情况下,也可以通过在目标上取得负面的当地进展来加快速度。