We prove that various stochastic gradient descent methods, including the stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods, almost surely avoid any strict saddle manifold. To the best of our knowledge, this is the first time such results are obtained for SHB and SNAG methods. Moreover, our analysis expands upon previous studies on SGD by removing the need for bounded gradients of the objective function and uniformly bounded noise. Instead, we introduce a more practical local boundedness assumption for the noisy gradient, which is naturally satisfied in empirical risk minimization problems typically seen in training of neural networks.
翻译:根据我们所知,这是第一次在SHB和SNAG方法中取得这种结果。 此外,我们的分析扩大了以往关于SGD的研究的范围,消除了对目标功能受约束的梯度和统一约束的噪音的需要。 相反,我们为噪音梯度引入了更实际的本地约束性假设,这自然满足了在神经网络培训中常见的经验风险最小化问题。