阿波罗: 用于非convex 斯托卡优化的适应性参数对角方形准牛顿法 (Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization)

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix. Importantly, the update and storage of the diagonal approximation of Hessian is as efficient as adaptive first-order optimization methods with linear complexity for both time and memory. To handle nonconvexity, we replace the Hessian with its rectified absolute value, which is guaranteed to be positive-definite. Experiments on three tasks of vision and language show that Apollo achieves significant improvements over other stochastic optimization methods, including SGD and variants of Adam, in term of both convergence speed and generalization performance. The implementation of the algorithm is available at https://github.com/XuezheMax/apollo.

翻译：在本文中,我们介绍阿波罗,这是非康韦克斯蒸汽优化的准纽顿法,它通过对角矩阵与赫森人相近,以动态方式将损失功能的曲线纳入其中。重要的是,黑森的对角近距离的更新和储存与具有适应性的第一阶优化方法一样有效,在时间和记忆上都具有线性复杂性。为了处理非混凝土,我们用被纠正的绝对值取代赫森,保证其绝对值为正定值。关于三种视觉和语言任务的实验表明,阿波罗在趋同速度和一般性性能方面,包括斯吉特和亚当的变体在内的其他对流优化方法都取得了显著的改进。算法的实施可在https://github.com/XuezheMax/apollo上查阅。

相关内容

拟牛顿法

关注 1

拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一，于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种算法在当时看来是非线性优化领域最具创造性的发明之一。不久R. Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠，使得非线性优化这门学科在一夜之间突飞猛进。

《图表示学习》报告，McGill助理教授Hamilton讲授，79页ppt

专知会员服务

71+阅读 · 2021年1月9日

【NeurIPS 2020】图神经网络的参数化解释器，Parameterized Explainer for GNN

专知会员服务

22+阅读 · 2020年11月13日

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

45+阅读 · 2020年10月31日

Fariz Darari简明《博弈论Game Theory》介绍，35页ppt

专知会员服务

112+阅读 · 2020年5月15日