The design of recurrent neural networks (RNNs) to accurately process sequential inputs with long-time dependencies is very challenging on account of the exploding and vanishing gradient problem. To overcome this, we propose a novel RNN architecture which is based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations that models networks of oscillators. The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem. A suite of experiments are presented to demonstrate that the proposed RNN provides state of the art performance on a variety of learning tasks with (very) long time-dependencies.
翻译:设计经常性神经网络(RNN)以准确处理具有长期依赖性的连续输入非常困难,因为梯度问题正在爆炸和消失。为了解决这个问题,我们提议建立一个新型的RNN结构,其基础是保持汉密尔顿二等普通差异方程式的离散结构,以模拟振荡器网络。由此形成的RNN是快速的、不可逆的(及时的)记忆效率,我们对隐藏的国家梯度有严格的界限,以证明爆炸和消失的梯度问题已经缓解。我们提出了一系列实验,以证明拟议的RNN提供了长期依赖性的各种学习任务的最新表现。