A main theoretical interest in biology and physics is to identify the nonlinear dynamical system (DS) that generated observed time series. Recurrent Neural Networks (RNNs) are, in principle, powerful enough to approximate any underlying DS, but in their vanilla form suffer from the exploding vs. vanishing gradients problem. Previous attempts to alleviate this problem resulted either in more complicated, mathematically less tractable RNN architectures, or strongly limited the dynamical expressiveness of the RNN. Here we address this issue by suggesting a simple regularization scheme for vanilla RNNs with ReLU activation which enables them to solve long-range dependency problems and express slow time scales, while retaining a simple mathematical structure which makes their DS properties partly analytically accessible. We prove two theorems that establish a tight connection between the regularized RNN dynamics and its gradients, illustrate on DS benchmarks that our regularization approach strongly eases the reconstruction of DS which harbor widely differing time scales, and show that our method is also en par with other long-range architectures like LSTMs on several tasks.
翻译:对生物学和物理学的主要理论兴趣是确定产生观测时间序列的非线性动态系统(DS)。经常性神经网络(RNNS)原则上足够强大,足以接近任何潜在的DS,但香草形式受到爆炸与消失梯度问题的影响。以前试图缓解这一问题的尝试要么造成更复杂、数学上不那么容易传动的RNN的架构,要么大大限制了RNN的动态表达性。我们在这里通过提出一个简单的正规化方案来解决这个问题,为Vanilla RNNS提供RELU激活,使其能够解决远程依赖问题和表达慢时标,同时保留一个简单的数学结构,使其DS特性部分可分析。我们证明有两个理论在正规化的RNN动态与其梯度之间建立了紧密的联系,在DS基准上说明我们的规范化方法大大便利了DS的重建,而DS的重建在时间尺度上有很大差异。我们的方法也与其他远程结构(如LSTMs)在几项任务上相近。