In this paper, we propose a Dimension-Reduced Second-Order Method (DRSOM) for convex and nonconvex (unconstrained) optimization. Under a trust-region-like framework, our method preserves the convergence of the second-order method while using only curvature information in a few directions. Consequently, the computational overhead of our method remains comparable to the first-order such as the gradient descent method. Theoretically, we show that the method has a local quadratic convergence and a global convergence rate of $O(\epsilon^{-3/2})$ to satisfy the first-order and second-order conditions if the subspace satisfies a commonly adopted approximated Hessian assumption. We further show that this assumption can be removed if we perform one \emph{corrector step} (using a Krylov method, for example) periodically at the end stage of the algorithm. The applicability and performance of DRSOM are exhibited by various computational experiments, particularly in machine learning and deep learning. For neural networks, our preliminary implementation seems to gain computational advantages in terms of training accuracy and iteration complexity over state-of-the-art first-order methods such as SGD and ADAM.
翻译:在本文中,我们提议了一种用于 convex 和非convex (不受限制的) 优化的尺寸降第二奥点法(DRSOM) 。 在类似信任区域的框架内,我们的方法保持了二阶方法的趋同,同时只使用一些缩进信息。因此,我们方法的计算间接费用仍然与一级方法(如梯度下降法)相仿。理论上,我们表明,该方法具有局部的四端趋同率和全球汇合率$O(\epsilon ⁇ -3/2}),以满足第一阶和第二阶条件,如果子空间满足了通常采用的大约赫斯假设。我们进一步表明,如果我们在算法的最后阶段定期执行一个\emph{校正(例如使用Krylov方法),这一假设是可以消除的。DRSOM的适用性和性表现在各种计算实验中,特别是在机器学习和深层次学习中。对于神经网络来说,我们的初步实施似乎在培训精度和超状态的重新定位方法方面获得了计算优势。