We present a new class of Langevin based algorithms, which overcomes many of the known shortcomings of popular adaptive optimizers that are currently used for the fine tuning of deep learning models. Its underpinning theory relies on recent advances of Euler's polygonal approximations for stochastic differential equations (SDEs) with monotone coefficients. As a result, it inherits the stability properties of tamed algorithms, while it addresses other known issues, e.g. vanishing gradients in neural networks. In particular, we provide a nonasymptotic analysis and full theoretical guarantees for the convergence properties of an algorithm of this novel class, which we named TH$\varepsilon$O POULA (or, simply, TheoPouLa). Finally, several experiments are presented with different types of deep learning models, which show the superior performance of TheoPouLa over many popular adaptive optimization algorithms.
翻译:我们提出了一个基于兰格文的新型算法,它克服了目前用于深层学习模型微调的流行适应性优化器的许多已知缺点。其基础理论依赖于最近Euler的多边形近似值的进步,即单调系数的微分方程(SDEs ) 。 结果,它继承了调制算法的稳定性,同时它解决了其他已知问题,例如神经网络中的渐变消失。特别是,我们为这个小类(我们称之为TH$\varepsilon$O POOUULA (或者简单地说,TheoPouLA)的算法的趋同性提供了非补救性分析和充分的理论保障。 最后,它用不同种类的深层次学习模型进行了一些实验,这些模型显示了TheoPouLa相对于许多流行的适应性优化算法的优异性表现。