In this article, we consider convergence of stochastic gradient descent schemes (SGD) under weak assumptions on the underlying landscape. More explicitly, we show that on the event that the SGD stays local we have convergence of the SGD if there is only a countable number of critical points or if the target function/landscape satisfies Lojasiewicz-inequalities around all critical levels as all analytic functions do. In particular, we show that for neural networks with analytic activation function such as softplus, sigmoid and the hyperbolic tangent, SGD converges on the event of staying local, if the random variables modeling the signal and response in the training are compactly supported.
翻译:在本篇文章中,我们考虑在基本景观的薄弱假设下,将随机梯度下降计划(SGD)合并在一起。更明确地说,我们表明,如果SGD保持局部状态,如果只有数量可计的临界点,或者目标函数/地貌与所有分析功能一样在所有关键级别上都满足Lojasiewicz-不平等,我们就会看到SGD的趋同。我们特别表明,对于具有软增、类固和双曲正切等分析引爆功能的神经网络,如果培训中的信号和反应的随机变量得到紧密支持,SGD会集中在当地状态。