The quasi-Newton methods generally provide curvature information by approximating the Hessian using the secant equation. However, the secant equation becomes insipid in approximating the Newton step owing to its use of the first-order derivatives. In this study, we propose an approximate Newton step-based stochastic optimization algorithm for large-scale empirical risk minimization of convex functions with linear convergence rates. Specifically, we compute a partial column Hessian of size ($d\times k$) with $k\ll d$ randomly selected variables, then use the \textit{Nystr\"om method} to better approximate the full Hessian matrix. To further reduce the computational complexity per iteration, we directly compute the update step ($\Delta\boldsymbol{w}$) without computing and storing the full Hessian or its inverse. Furthermore, to address large-scale scenarios in which even computing a partial Hessian may require significant time, we used distribution-preserving (DP) sub-sampling to compute a partial Hessian. The DP sub-sampling generates $p$ sub-samples with similar first and second-order distribution statistics and selects a single sub-sample at each epoch in a round-robin manner to compute the partial Hessian. We integrate our approximated Hessian with stochastic gradient descent and stochastic variance-reduced gradients to solve the logistic regression problem. The numerical experiments show that the proposed approach was able to obtain a better approximation of Newton\textquotesingle s method with performance competitive with the state-of-the-art first-order and the stochastic quasi-Newton methods.
翻译:准Newton 方法一般通过使用松动方程式来近似 Hessian 以随机选择的 $k\ll d美元变量来提供曲线信息。 但是, 松动方程式由于使用一阶衍生物而近似于 Newston 步步步相近。 在这次研究中, 我们提议为大规模实验风险最小化的 convex 函数, 使用线性趋同率, 大约采用 Newton 步相基优化算法。 具体地说, 我们用随机选择的 $k\ll d$ 来计算一个大小部分的 Hessian 柱形( $d\ times k$), 然后再使用\ textitleit{ nystries\'om 方法来更好地接近全牛顿步骤。 为了进一步降低计算复杂性, 我们直接将更新步骤的 Newta\bilentralal- 方法( DP) 和 IMBeral- pal- passia 的亚序方法显示一个更精确的缩缩化方法。