阿波罗: 用于非convex 斯托卡优化的适应性参数对角方形准牛顿法 (Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization)

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix. Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite. Experiments on three tasks of vision and language show that Apollo achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in term of both convergence speed and generalization performance. The implementation of the algorithm is available at https://github.com/XuezheMax/apollo.

翻译：在本文中,我们介绍阿波罗,这是非康韦克斯蒸汽优化的准纽顿法,它通过对角矩阵与赫森人相近,以动态方式将损失功能的曲线纳入其中。重要的是,黑森的对角近距离的更新和储存与具有适应性的第一阶优化方法一样有效,在时间和记忆上都具有线性复杂性。为了处理非混凝土,我们用被纠正的绝对值取代赫森,保证其绝对值为正定值。关于三种视觉和语言任务的实验表明,阿波罗在趋同速度和一般性性能方面,包括斯吉特和亚当的变体在内的其他对流优化方法都取得了显著的改进。算法的实施可在https://github.com/XuezheMax/apollo上查阅。

相关内容

拟牛顿法

关注 1

拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一，于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种算法在当时看来是非线性优化领域最具创造性的发明之一。不久R. Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠，使得非线性优化这门学科在一夜之间突飞猛进。

计算机理论顶会STOC 2021奖项出炉，滕尚华等华人学者获奖

专知会员服务

8+阅读 · 2021年7月22日

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

51+阅读 · 2020年12月14日

(普林斯顿讲义)：高维概率论，326页pdf《Probability in High Dimension》

专知会员服务

122+阅读 · 2020年5月30日