Optimization in Deep Learning is mainly guided by vague intuitions and strong assumptions, with a limited understanding how and why these work in practice. To shed more light on this, our work provides some deeper understandings of how SGD behaves by empirically analyzing the trajectory taken by SGD from a line search perspective. Specifically, a costly quantitative analysis of the full-batch loss along SGD trajectories from common used models trained on a subset of CIFAR-10 is performed. Our core results include that the full-batch loss along lines in update step direction is highly parabolically. Further on, we show that there exists a learning rate with which SGD always performs almost exact line searches on the full-batch loss. Finally, we provide a different perspective why increasing the batch size has almost the same effect as decreasing the learning rate by the same factor.
翻译:深层学习中的优化主要以模糊的直觉和强有力的假设为指导,对这些实际工作的方式和原因理解有限。为了更清楚地了解这一点,我们的工作通过从线条搜索角度对SGD的轨迹进行实证分析,对SGD的行为方式提供了一些更深刻的理解。具体地说,对SGD轨道上的全批损失进行了费用高昂的定量分析,这些损失来自在CIFAR-10子类培训的通用使用模型。我们的核心结果包括全批损失在更新步骤方向上的路线上是高度重复性的。此外,我们表明,SGD总是以学习速度对全批损失进行几乎精确的线搜索。最后,我们提出了不同的观点,为什么增加批量的规模几乎具有以同样因素降低学习率的同样效果。