In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we make substantial progress on this shortcoming. First, we establish that SGD's iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability.
翻译:在机器学习中,悬浮梯度下降(SGD)被广泛应用到使用高度非混凝土目标的模型上,使用同样复杂的噪音模型。 不幸的是,SGD理论常常作出限制性假设,无法捕捉实际问题的非混杂性,几乎完全忽视实践中存在的复杂的噪音模型。在这项工作中,我们在这一缺陷上取得了长足的进展。首先,我们确定SGD的迭代将在全球范围汇集到一个固定点上,或者在几乎任意的非混凝土和噪音模型下出现差异。在对非混凝土和噪音模型的共同行为略加限制的假设中,我们显示,即使这些假设相互偏差,目标功能也不可能不同。由于我们的结果,SGD可以被应用到更多关于其全球趋同行为和稳定性的信心的随机优化问题上。