We employ constraints to control the parameter space of deep neural networks throughout training. The use of customized, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimization schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.
翻译:通过培训,我们采用限制手段控制深神经网络的参数空间。使用定制的、设计得当的制约因素可以减少消失/爆炸梯度问题,提高分类界限的平稳性、控制重量大小和稳定深神经网络,从而增强培训算法的稳健性和神经网络的一般化能力。我们提供了一个总体方法,以便有效地将制约因素纳入一个随机梯度梯度的兰格文框架,从而能够加强对损失地貌的探索。我们还举出了因重力矩阵的正异性保全和明显的重量正常化而导致的限制性培训方法的具体例子。对于高压的兰格文动态和低压的形态,都规定了分解办法,在这种方式下,即立即进一步提高取样效率。这些优化办法可以直接使用,而无需调整神经网络结构的设计选择,或用规范术语修改目标,并见分类任务的业绩改进。