Second-order optimization uses curvature information about the objective function, which can help in faster convergence. However, such methods typically require expensive computation of the Hessian matrix, preventing their usage in a scalable way. The absence of efficient ways of computation drove the most widely used methods to focus on first-order approximations that do not capture the curvature information. In this paper, we develop HesScale, a scalable approach to approximating the diagonal of the Hessian matrix, to incorporate second-order information in a computationally efficient manner. We show that HesScale has the same computational complexity as backpropagation. Our results on supervised classification show that HesScale achieves high approximation accuracy, allowing for scalable and efficient second-order optimization.
翻译:第二顺序优化使用关于目标功能的曲线信息,有助于更快地趋同。然而,这类方法通常需要昂贵的黑森矩阵计算,防止以可缩放的方式使用。由于缺乏高效的计算方法,导致最广泛使用的方法侧重于不捕捉曲线信息的一阶近似值。在本文中,我们开发了Hesscale,这是接近黑森矩阵对齐线的一种可缩放方法,可以以高效的计算方式将二阶信息纳入其中。我们显示,海瑟基的计算复杂性与反向调整相同。我们在监督下分类的结果显示,海瑟克的近似精确度很高,可以实现可缩放和高效的第二阶优化。