Second-order methods have the capability of accelerating optimization by using much richer curvature information than first-order methods. However, most are impractical for deep learning, where the number of training parameters is huge. In Goldfarb et al. (2020), practical quasi-Newton methods were proposed that approximate the Hessian of a multilayer perceptron (MLP) model by a layer-wise block diagonal matrix where each layer's block is further approximated by a Kronecker product corresponding to the structure of the Hessian restricted to that layer. Here, we extend these methods to enable them to be applied to convolutional neural networks (CNNs), by analyzing the Kronecker-factored structure of the Hessian matrix of convolutional layers. Several improvements to the methods in Goldfarb et al. (2020) are also proposed that can be applied to both MLPs and CNNs. These new methods have memory requirements comparable to first-order methods and much less per-iteration time complexity than those in Goldfarb et al. (2020). Moreover, convergence results are proved for a variant under relatively mild conditions. Finally, we compared the performance of our new methods against several state-of-the-art (SOTA) methods on MLP autoencoder and CNN problems, and found that they outperformed the first-order SOTA methods and performed comparably to the second-order SOTA methods.
翻译:第二阶方法能够通过使用比第一阶方法更丰富的曲线信息来加速优化优化; 然而, 多数方法对于深层次学习来说是不切实际的, 培训参数数量巨大。 在Goldfarb 等人( 2020年), 提出了实用的准牛顿方法, 以一个分层块对立矩阵来接近多层光谱(MLP) 模型的赫沙(Hessian), 每一层的成块都由一种与赫斯亚结构相对应的克罗涅克产品进一步近似。 在这里, 我们将这些方法推广到这些方法, 以便能够应用到卷心神经网络( CNNs), 分析赫萨德的卷团结构结构的克伦克尔- 标准结构( Kronecker- ) ( 202020年 ) 。 也提议对戈德法布等人( Heldfarb ) 的多层光谱模型(MLPL ) 模型进行一些改进, 这些新方法的记忆要求可与一级方法相比, 并且比戈德法( 20202020年) 。 此外,, 其第二阶比我们所执行的常规- SO- Sax- SOL 方法有几种方法有多种方法,,,, 和常规- SOL 方法有多种方法, 和 的第二个方法比。