Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. In many cases, the gradient of any such loss function has superlinear growth, making the use of the widely-accepted (stochastic) gradient descent methods, which are based on Euler numerical schemes, problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in \citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in \citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.
翻译:人工神经网络(ANNS)通常是高度非线性系统,通过优化其相关、非Convex损失功能进行微调。在许多情况下,任何此类损失功能的梯度都有超线性增长,使用广泛接受的(随机)梯度下降方法,这些方法以Euler数值方法为基础,存在问题。我们提供了一种新的学习算法,其依据是流行的随机梯度梯度朗埃文动态(SGLD)(SGLD)(SGLD),它被称为“TUSLA ” (TUSLA) 。我们还提供了在使用非线性学习问题的背景下,任何此类损失函数的梯度增长都具有超线性分析。因此,我们为TUSLA提供了限定时间的保证,以适当构建的流行性梯度梯度梯度梯度梯度梯度梯度(SGLD) 动态(SGLD) (SGLD) (SGLD), 即以新线性超线性参数测试技术为基础,在\cite{tem-eural-eal) 的Agalus Adalgalationsus 和Nualationsural_Adsurationsurationsurations (NA) ASl) 和Nucal_Adsurationalbisl) ASl) 需要的理论分析结果。