We develop a new algorithm for non-convex stochastic optimization that finds an $\epsilon$-critical point in the optimal $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector product computations. Our algorithm uses Hessian-vector products to "correct" a bias term in the momentum of SGD with momentum. This leads to better gradient estimates in a manner analogous to variance reduction methods. In contrast to prior work, we do not require excessively large batch sizes, and are able to provide an adaptive algorithm whose convergence rate automatically improves with decreasing variance in the gradient estimates. We validate our results on a variety of large-scale deep learning architectures and benchmarks tasks.
翻译:我们开发了一种新的非convex 蒸汽优化算法, 在最佳的 $O (\\ epsilon} - 3}) 和 Hessian- Victor 产品计算中找到一个 $\ epsilon$- 关键点。 我们的算法使用Hessian- Victor 产品来“ 纠正” SGD 动力上的一个偏差术语。 这导致以与差异减少方法相似的方式进行更好的梯度估计。 与以前的工作相比, 我们不需要过大批量的批量, 并且能够提供适应性算法, 其趋同率随着梯度估计值的下降而自动提高。 我们验证了各种大型深层次学习结构和基准任务的结果 。