Newton methods have fallen out of favor for modern optimization problems (e.g. deep learning) because of concerns about per-iteration computational complexity. In this setting highly subsampled first order methods are preferred. In this work we motivate the extension of Newton methods to the highly stochastic regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method. In this setting, iterative updates are dominated by stochastic noise, and stability of the method is key. In stability analysis, we demonstrate that stochastic errors for Newton methods can be greatly amplified by ill-conditioned matrix operators. The LRSFN algorithm mitigates this issue by the use of Levenberg-Marquardt damping, but generally second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. Numerical results show that even under restrictive step-length conditions, LRSFN can outperform popular first order methods on nontrivial deep learning tasks in terms of generalizability for equivalent computational work.
翻译:牛顿方法对于现代优化问题(例如深层学习)已经失去优势,因为人们担心按部就班地计算复杂性。 在这种设置中,偏好高度分抽样的第一顺序方法。 在这项工作中,我们激励牛顿方法推广到高度随机系统,并主张使用可缩放的低级马鞍无牛顿(LRSFN)方法。在这个环境中,迭代更新主要是由随机噪音,而方法的稳定性是关键。在稳定性分析中,我们证明对牛顿方法的随机错误可以通过条件不完善的矩阵操作者大大放大。 勒文伯格-马尔夸德特的算法通过使用利文登堡-马尔夸德调控,缓解了这一问题,但一般来说,与偏差的赫斯和梯度信息相比,可能需要采取小步的第二顺序方法。 数字结果显示,即使在限制性的步长条件下,新牛顿系统也可以超越在同等计算工作一般可读性方面非深层次学习任务的第一顺序方法。