Kronecker 以克罗内克尔为主的 Quasi-Newton 深层学习方法 (Kronecker-factored Quasi-Newton Methods for Deep Learning)

Second-order methods have the capability of accelerating optimization by using much richer curvature information than first-order methods. However, most are impractical for deep learning, where the number of training parameters is huge. In Goldfarb et al. (2020), practical quasi-Newton methods were proposed that approximate the Hessian of a multilayer perceptron (MLP) model by a layer-wise block diagonal matrix where each layer's block is further approximated by a Kronecker product corresponding to the structure of the Hessian restricted to that layer. Here, we extend these methods to enable them to be applied to convolutional neural networks (CNNs), by analyzing the Kronecker-factored structure of the Hessian matrix of convolutional layers. Several improvements to the methods in Goldfarb et al. (2020) are also proposed that can be applied to both MLPs and CNNs. These new methods have memory requirements comparable to first-order methods and much less per-iteration time complexity than those in Goldfarb et al. (2020). Moreover, convergence results are proved for a variant under relatively mild conditions. Finally, we compared the performance of our new methods against several state-of-the-art (SOTA) methods on MLP autoencoder and CNN problems, and found that they outperformed the first-order SOTA methods and performed comparably to the second-order SOTA methods.

翻译：第二阶方法能够通过使用比第一阶方法更丰富的曲线信息来加速优化优化; 然而, 多数方法对于深层次学习来说是不切实际的, 培训参数数量巨大。在Goldfarb 等人( 2020年), 提出了实用的准牛顿方法, 以一个分层块对立矩阵来接近多层光谱(MLP) 模型的赫沙(Hessian), 每一层的成块都由一种与赫斯亚结构相对应的克罗涅克产品进一步近似。在这里, 我们将这些方法推广到这些方法, 以便能够应用到卷心神经网络( CNNs), 分析赫萨德的卷团结构结构的克伦克尔- 标准结构( Kronecker- ) ( 202020年 ) 。也提议对戈德法布等人( Heldfarb ) 的多层光谱模型(MLPL ) 模型进行一些改进, 这些新方法的记忆要求可与一级方法相比, 并且比戈德法( 20202020年) 。此外,, 其第二阶比我们所执行的常规- SO- Sax- SOL 方法有几种方法有多种方法,,,, 和常规- SOL 方法有多种方法, 和的第二个方法比。

相关内容

拟牛顿法

关注 1

拟牛顿法(Quasi-Newton Methods)是求解非线性优化问题最有效的方法之一，于20世纪50年代由美国Argonne国家实验室的物理学家W. C. Davidon所提出来。Davidon设计的这种算法在当时看来是非线性优化领域最具创造性的发明之一。不久R. Fletcher和M. J. D. Powell证实了这种新的算法远比其他方法快速和可靠，使得非线性优化这门学科在一夜之间突飞猛进。

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

【ACML2020】张量网络机器学习:最近的进展和前沿，109页ppt

专知会员服务

55+阅读 · 2020年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日