In this paper, we consider stochastic second-order methods for minimizing a finite summation of nonconvex functions. One important key is to find an ingenious but cheap scheme to incorporate local curvature information. Since the true Hessian matrix is often a combination of a cheap part and an expensive part, we propose a structured stochastic quasi-Newton method by using partial Hessian information as much as possible. By further exploiting either the low-rank structure or the kronecker-product properties of the quasi-Newton approximations, the computation of the quasi-Newton direction is affordable. Global convergence to stationary point and local superlinear convergence rate are established under some mild assumptions. Numerical results on logistic regression, deep autoencoder networks and deep convolutional neural networks show that our proposed method is quite competitive to the state-of-the-art methods.
翻译:在本文中,我们考虑了将非康威功能的有限相加最小化的第二顺序方法。 重要的关键之一是找到一个巧妙但廉价的办法, 以纳入本地曲线信息。 由于真正的赫森矩阵往往是廉价部分和昂贵部分的组合, 我们建议尽可能使用部分赫森信息, 以结构化的随机准牛顿方法。 通过进一步利用准纽顿近似的低级结构或克朗产品特性, 计算准纽顿方向是可承受的。 全球与固定点和地方超线性趋同率是在一些温和假设下建立的。 物流回归的数值结果、 深度自动电解网络和深共振神经网络表明,我们提出的方法与最新方法相当具有竞争力。