Second-order optimization methods are among the most widely used optimization approaches for convex optimization problems, and have recently been used to optimize non-convex optimization problems such as deep learning models. The widely used second-order optimization methods such as quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton sketch-based stochastic optimization algorithm for large-scale empirical risk minimization. Specifically, we compute a partial column Hessian of size ($d\times m$) with $m\ll d$ randomly selected variables, then use the \emph{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. We then integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradient methods. The results of numerical experiments on both convex and non-convex functions show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method, exhibiting performance competitive with that of state-of-the-art first-order and stochastic quasi-Newton methods. Furthermore, we provide a theoretical convergence analysis for convex functions.
翻译:第二顺序优化方法是最广泛使用的 convex 优化问题优化方法之一,并且最近被用于优化非convex优化问题,例如深层次学习模型。 广泛使用的二阶优化方法,例如准Newton 方法,通常通过使用松式方程式对赫西安人进行近似同步化,提供曲线信息。 然而, 松式方程式由于使用第一阶衍生物而变得接近牛顿步骤。 在本研究中, 我们提议为大规模实验风险最小化而使用近似 Newton 的基于素描的竞争性优化算法。 具体地说, 我们用随机选择的变量来计算部分二阶优化( d\time m$), 然后使用 emph{Nystrar\\\\\'om 方法, 更接近于整个赫斯克克斯矩阵。 为了进一步降低计算复杂性, 我们直接对更新的步伐( Delta\ boldsymall{w}, 不首先计算和存储完整的 Heschax 递增级方法, 然后将我们的直径递增的递增的递增性递增性变变变变变法方法整合。