We consider non-convex stochastic optimization problems where the objective functions have super-linearly growing and discontinuous stochastic gradients. In such a setting, we provide a non-asymptotic analysis for the tamed unadjusted stochastic Langevin algorithm (TUSLA) introduced in Lovas et al. (2021). In particular, we establish non-asymptotic error bounds for the TUSLA algorithm in Wasserstein-1 and Wasserstein-2 distances. The latter result enables us to further derive non-asymptotic estimates for the expected excess risk. To illustrate the applicability of the main results, we consider an example from transfer learning with ReLU neural networks, which represents a key paradigm in machine learning. Numerical experiments are presented for the aforementioned example which supports our theoretical findings. Hence, in this setting, we demonstrate both theoretically and numerically that the TUSLA algorithm can solve the optimization problem involving neural networks with ReLU activation function. Besides, we provide simulation results for synthetic examples where popular algorithms, e.g. ADAM, AMSGrad, RMSProp, and (vanilla) SGD, may fail to find the minimizer of the objective functions due to the super-linear growth and the discontinuity of the corresponding stochastic gradient, while the TUSLA algorithm converges rapidly to the optimal solution.
翻译:在目标函数具有超直线增长和不连续随机梯度的地方,我们考虑非convex 随机优化问题。在这样的环境下,我们为Lovas 等人(2021年) 引入的调制的未经调整的随机朗埃文算法(TUSLA)提供非无防患性分析。特别是,我们在Wasserstein-1和Wasserstein-2距离的TUSLA算法中为TUSLA算法确定了非无防患性误差界限。后一结果使我们能够进一步得出预期超重风险的非防患性估计。为了说明主要结果的可适用性,我们考虑与ReLU神经网络进行转移学习的一个实例,这是机器学习中的一个关键范例。为上述例子提供了数值实验,支持我们的理论结论。因此,我们从理论上和数字上证明,TUSLA算法可以用RELU激活功能解决神经网络的优化问题。此外,我们提供合成例子,例如:ADADAR、AM、AM、ASROMPA 和SMA 等通向相应的SGDA 最终解决方案的通用算法,从而找到通制的快速的流行算法。