Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data. They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training. This is known as the exploding and vanishing gradient problem. Previous solutions to this issue either built on rather complicated, purpose-engineered architectures with gated memory buffers, or - more recently - imposed constraints that ensure convergence to a fixed point or restrict (the eigenspectrum of) the recurrence matrix. Such constraints, however, convey severe limitations on the expressivity of the RNN. Essential intrinsic dynamics such as multistability or chaos are disabled. This is inherently at disaccord with the chaotic nature of many, if not most, time series encountered in nature and society. It is particularly problematic in scientific applications where one aims to reconstruct the underlying dynamical system. Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits. We mathematically prove that RNNs producing stable equilibrium or cyclic behavior have bounded gradients, whereas the gradients of RNNs with chaotic dynamics always diverge. Based on these analyses and insights we suggest ways of how to optimize the training process on chaotic data according to the system's Lyapunov spectrum, regardless of the employed RNN architecture.
翻译:经常性神经网络( RNN) 是用于模拟序列和时间序列数据的宽广的机器学习工具, 建模序列和时间序列数据的模型。 众所周知, 培训非常困难, 因为它们的丢失梯度在时间上反反射, 往往会饱和或差异。 这被称为渐变问题。 这个问题的先前解决方案要么建立在相当复杂、 目的设计的结构上, 带有封存缓冲, 或更近一些的附加限制, 以确保与固定点的趋同或限制( eigenspecrt) 复发矩阵。 但是, 这些制约对 RNNN 的表达性造成了严重的限制。 基本内在动态, 如多变或混乱, 已经禁用。 这与许多( 如果不是大部分)自然和社会中遇到的时间序列的混乱性质有矛盾。 在科学应用中, 特别存在问题, 目的是重建基本的动态系统。 我们在这里提供一个全面的理论处理这一问题的方法, 将 RNNN 培训期间的损失梯度与 Lyapunov 频谱谱 的频谱带 。 我们数学证明 RNNNNN 系统 生成的螺旋和渐变的系统如何产生稳定平衡或周期性分析, 。 我们的渐变的系统如何的渐变。