Artificial neural networks (ANNs) are typically highly nonlinear systems which are finely tuned via the optimization of their associated, non-convex loss functions. Typically, the gradient of any such loss function fails to be dissipative making the use of widely-accepted (stochastic) gradient descent methods problematic. We offer a new learning algorithm based on an appropriately constructed variant of the popular stochastic gradient Langevin dynamics (SGLD), which is called tamed unadjusted stochastic Langevin algorithm (TUSLA). We also provide a nonasymptotic analysis of the new algorithm's convergence properties in the context of non-convex learning problems with the use of ANNs. Thus, we provide finite-time guarantees for TUSLA to find approximate minimizers of both empirical and population risks. The roots of the TUSLA algorithm are based on the taming technology for diffusion processes with superlinear coefficients as developed in \citet{tamed-euler, SabanisAoAP} and for MCMC algorithms in \citet{tula}. Numerical experiments are presented which confirm the theoretical findings and illustrate the need for the use of the new algorithm in comparison to vanilla SGLD within the framework of ANNs.
翻译:人工神经网络(ANNS)通常是高度非线性系统,通过优化其相关、非孔雀损失功能进行微调。通常,任何此类损失函数的梯度不会分散,以致使用广泛接受的(随机)梯度下降方法有问题。我们提供了一种新的学习算法,其基础是流行的随机梯度梯度朗埃文动态(SGLD)(SGLD)的恰当构建变式,称为调制的未经调整的超线性随机朗埃文算法(TUSLA),我们还提供了在使用非康牛学习问题的情况下对新算法趋同特性的非抽调性分析。因此,我们为TUSLA提供了有限时间保证,以找到经验风险和人口风险的近似最小化因素。TUSLA算法的基础是在\citet-eural中开发的用于超线性系数传播过程的调制技术。Sabiani-eler、Sabani-AAP}和用于MC MILLA的理论模型框架中的NUMALLA,用以证实新分析结果中的NULD。