Deep Neural Networks (DNNs) training can be difficult due to vanishing and exploding gradients during weight optimization through backpropagation. To address this problem, we propose a general class of Hamiltonian DNNs (H-DNNs) that stem from the discretization of continuous-time Hamiltonian systems and include several existing DNN architectures based on ordinary differential equations. Our main result is that a broad set of H-DNNs ensures non-vanishing gradients by design for an arbitrary network depth. This is obtained by proving that, using a semi-implicit Euler discretization scheme, the backward sensitivity matrices involved in gradient computations are symplectic. We also provide an upper-bound to the magnitude of sensitivity matrices and show that exploding gradients can be controlled through regularization. Finally, we enable distributed implementations of backward and forward propagation algorithms in H-DNNs by characterizing appropriate sparsity constraints on the weight matrices. The good performance of H-DNNs is demonstrated on benchmark classification problems, including image classification with the MNIST dataset.
翻译:深神经网络(DNNs)培训可能由于在通过反向调整来优化重量时,梯度时会消失和爆炸梯度而变得困难。为了解决这个问题,我们提出一个汉密尔顿-DNNs(H-DNNs)一般类别,该类别来自连续的汉密尔顿系统离散,包括基于普通差异方程式的现有DNN结构。我们的主要结果是,一大批H-DNNs通过设计任意的网络深度,确保了非加速梯度。这通过证明使用半不透明的欧尔离散计划,梯度计算中的后向敏感矩阵是静默性的。我们还提供了一个高空的敏感矩阵,表明通过正规化可以控制爆炸梯度。最后,我们通过对重量矩阵上的适当音量限制进行定性,使H-DNNNs的后向和前向传播算法得以分散实施。H-DNNs的良好性表现体现在基准分类上的问题,包括与MNIST数据集的图像分类。