In modern deep learning, highly subsampled stochastic approximation (SA) methods are preferred to sample average approximation (SAA) methods because of large data sets as well as generalization properties. Additionally, due to perceived costs of forming and factorizing Hessians, second order methods are not used for these problems. In this work we motivate the extension of Newton methods to the SA regime, and argue for the use of the scalable low rank saddle free Newton (LRSFN) method, which avoids forming the Hessian in favor of making a low rank approximation. Additionally, LRSFN can facilitate fast escape from indefinite regions leading to better optimization solutions. In the SA setting, iterative updates are dominated by stochastic noise, and stability of the method is key. We introduce a continuous time stability analysis framework, and use it to demonstrate that stochastic errors for Newton methods can be greatly amplified by ill-conditioned Hessians. The LRSFN method mitigates this stability issue via Levenberg-Marquardt damping. However, generally the analysis shows that second order methods with stochastic Hessian and gradient information may need to take small steps, unlike in deterministic problems. Numerical results show that LRSFN can escape indefinite regions that other methods have issues with; and even under restrictive step length conditions, LRSFN can outperform popular first order methods on large scale deep learning tasks in terms of generalizability for equivalent computational work.
翻译:在现代深层次的学习中,由于大型数据集和一般化特性的特性,高低抽样的随机近似(SA)方法优于抽样平均近近近(SAA)方法。此外,由于形成和使黑森人因素化的成本高,因此没有为这些问题使用第二顺序方法。在这项工作中,我们鼓励将牛顿方法推广到南泽制度,并主张使用可缩放的低级牛顿(LRSFN)马鞍(LRSFN)方法,该方法避免形成黑森人,而倾向于低级近近近似。此外,新生力量可促进从不定期地区快速逃出,导致更好的优化解决方案。在南泽的设置中,迭代更新以随机噪音为主,方法的稳定性是关键。我们引入了一个持续的时间稳定性分析框架,并用它来证明牛顿方法的随机误差可以因不便的黑森人而大大放大。Levenberg-Marqirdping, 勒维兹-Mard Plapper prial 方法可以缓解这一稳定性问题。但是,一般而言,新遥感标准方法的第二顺序方法需要先行方法,在不固定的次序方法中,在不固定的顺序中甚至的顺序方法下,可以显示不固定的顺序方法中,而可以展示的顺序方法可以展示的路径的顺序,其学习方法可以展示。