Training a very deep neural network is a challenging task, as the deeper a neural network is, the more non-linear it is. We compare the performances of various preconditioned Langevin algorithms with their non-Langevin counterparts for the training of neural networks of increasing depth. For shallow neural networks, Langevin algorithms do not lead to any improvement, however the deeper the network is and the greater are the gains provided by Langevin algorithms. Adding noise to the gradient descent allows to escape from local traps, which are more frequent for very deep neural networks. Following this heuristic we introduce a new Langevin algorithm called Layer Langevin, which consists in adding Langevin noise only to the weights associated to the deepest layers. We then prove the benefits of Langevin and Layer Langevin algorithms for the training of popular deep residual architectures for image classification.
翻译:训练非常深的神经网络是一项艰巨的任务,因为神经网络越深,就越非线性。 我们比较了各种有先决条件的Langevin算法的性能,与非Langevin算法的性能,以训练越来越深的神经网络。 对于浅层神经网络来说,Langevin算法没有带来任何改进,然而,网络越深,越是越是越是利用Langevin算法获得的收益。 增加渐渐下降的噪音可以逃离本地陷阱,而对于深层神经网络来说,这种陷阱越是常见的。 之后,我们引入了一个新的Langevin算法, 叫做Teleum Langevin, 包括将Langevin噪声只添加到与最深层相联的重量上。 然后我们证明了Langevin和Tille Langevin算法对培训受欢迎的深残余图像分类结构的好处。