We propose a method for efficiently incorporating constraints into a stochastic gradient Langevin framework for the training of deep neural networks. Constraints allow direct control of the parameter space of the model. Appropriately designed, they reduce the vanishing/exploding gradient problem, control weight magnitudes and stabilize deep neural networks and thus improve the robustness of training algorithms and the generalization capabilities of the trained neural network. We present examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. We describe the methods in the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta help to improve sampling efficiency. The methods are explored in test examples in image classification and natural language processing.
翻译:我们提出了一种将制约因素有效纳入深神经网络培训的随机梯度梯度Langevin框架的方法,这些制约因素可以直接控制模型的参数空间。这些制约因素设计得当,可以减少消失/爆炸梯度问题、控制重量大小和稳定深神经网络,从而提高培训算法的稳健性和受过训练的神经网络的普及能力。我们举例说明了由对重力矩阵的异位保全和明显重量正常化所驱动的受限制的培训方法。我们描述了高压的Langevin动态的配制方法和未得到充分推广的形式,在这种情形下有助于提高取样效率。在图像分类和自然语言处理的试验实例中探讨了这些方法。