We prove quantitative convergence rates at which discrete Langevin-like processes converge to the invariant distribution of a related stochastic differential equation. We study the setup where the additive noise can be non-Gaussian and state-dependent and the potential function can be non-convex. We show that the key properties of these processes depend on the potential function and the second moment of the additive noise. We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset.
翻译:我们证明,离散的Langevin类似工艺在数量上的趋同率与相关随机差异方程式的不定分布相融合。我们研究了添加性噪声可能非加利安那州或国家独立,潜在功能可能非康维克斯的设置。我们证明,这些工艺的关键特性取决于添加性噪声的潜在功能和第二时刻。我们运用我们的理论研究结果,研究非康维克斯问题斯托切斯梯层(SGD)的趋同性,并用SGD实验来证实它们,用SGD来训练CIFAR-10数据集的深神经网络。