Deep learning applications require global optimization of non-convex objective functions, which have multiple local minima. The same problem is often found in physical simulations and may be resolved by the methods of Langevin dynamics with Simulated Annealing, which is a well-established approach for minimization of many-particle potentials. This analogy provides useful insights for non-convex stochastic optimization in machine learning. Here we find that integration of the discretized Langevin equation gives a coordinate updating rule equivalent to the famous Momentum optimization algorithm. As a main result, we show that a gradual decrease of the momentum coefficient from the initial value close to unity until zero is equivalent to application of Simulated Annealing or slow cooling, in physical terms. Making use of this novel approach, we propose CoolMomentum -- a new stochastic optimization method. Applying Coolmomentum to optimization of Resnet-20 on Cifar-10 dataset and Efficientnet-B0 on Imagenet, we demonstrate that it is able to achieve high accuracies.
翻译:深度学习应用需要全球优化非碳化目标功能,这些功能具有多重本地微量值。 同样的问题经常在物理模拟中发现,并可能通过模拟Annaaling的Langevin动态方法来解决,这是尽量减少多粒子潜力的既定方法。 这个类比为机学中非碳化蒸汽优化提供了有用的洞察力。 我们在这里发现, 分解的兰氏方程式的整合提供了与著名的Momentum优化算法相当的协调更新规则。 主要结果是, 我们显示, 动力系数从初始值接近统一时逐渐下降至零, 在物理术语中相当于模拟烷化或慢冷却的应用。 我们建议使用这种新颖方法, 冷色Momentum -- -- 一种新型的随机优化方法。 在Cifar- 10 数据集上应用冷却模型来优化Resnet-20, 我们在图像网上应用冷却调调调, 我们证明它能够达到高度的适应力。