In this work we describe an Adaptive Regularization using Cubics (ARC) method for large-scale nonconvex unconstrained optimization using Limited-memory Quasi-Newton (LQN) matrices. ARC methods are a relatively new family of optimization strategies that utilize a cubic-regularization (CR) term in place of trust-regions and line-searches. LQN methods offer a large-scale alternative to using explicit second-order information by taking identical inputs to those used by popular first-order methods such as stochastic gradient descent (SGD). Solving the CR subproblem exactly requires Newton's method, yet using properties of the internal structure of LQN matrices, we are able to find exact solutions to the CR subproblem in a matrix-free manner, providing large speedups and scaling into modern size requirements. Additionally, we expand upon previous ARC work and explicitly incorporate first-order updates into our algorithm. We provide experimental results when the SR1 update is used, which show substantial speed-ups and competitive performance compared to Adam and other second order optimizers on deep neural networks (DNNs). We find that our new approach, ARCLQN, compares to modern optimizers with minimal tuning, a common pain-point for second order methods.
翻译:在这项工作中,我们描述的是使用“立方体”(ARC)方法的适应性常规化(ARC)方法,用于大规模非闭塞-牛顿(LQN)矩阵的大规模非闭关优化。ARC方法是一个相对新的优化战略组合,它使用“立方正”来取代信任区域和线搜索。LQN方法提供了使用明确的二阶信息的大规模替代方法,它采用与流行的第一阶方法(如随机梯度下移(SGD)所使用的方法)相同的投入。 解决CR子问题完全需要牛顿的方法,但使用LQN矩阵内部结构的特性。ARC方法是一个相对较新的优化战略,它使用“立方体”来取代信任区域和线搜索。LQN方法提供了使用“立方”术语的精确解决方案,取代信任区域和线搜索。LQN。此外,我们扩大了ARC以前的工作,并明确将第一阶级更新纳入我们的算法。我们使用SR1更新时,我们提供了实验结果,这显示与亚当和其他第二阶次优化的同步方法相比,与我们的深层神经最优化的常规网络相比,我们找到了“最优化的第二阶”的第二阶中最优化方法。