Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data. They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training. This is known as the exploding and vanishing gradient problem. Previous solutions to this issue either built on rather complicated, purpose-engineered architectures with gated memory buffers, or - more recently - imposed constraints that ensure convergence to a fixed point or restrict (the eigenspectrum of) the recurrence matrix. Such constraints, however, convey severe limitations on the expressivity of the RNN. Essential intrinsic dynamics such as multistability or chaos are disabled. This is inherently at disaccord with the chaotic nature of many, if not most, time series encountered in nature and society. Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits. We mathematically prove that RNNs producing stable equilibrium or cyclic behavior have bounded gradients, whereas the gradients of RNNs with chaotic dynamics always diverge. Based on these analyses and insights, we offer an effective yet simple training technique for chaotic data and guidance on how to choose relevant hyperparameters according to the Lyapunov spectrum.
翻译:经常性神经网络(RNNs)是用于模拟序列和时间序列数据的宽广的机器学习工具,是用于模拟序列和时间序列数据的宽广的机器学习工具。它们非常难以训练,因为其丢失的梯度在时间上反反射,在培训期间往往会饱和或差异。这被称为爆炸和消失的梯度问题。以前,这一问题的解决办法要么建立在相当复杂、目的设计的结构上,并配有封闭的内存缓冲,或者更近一些的附加限制,以确保与固定点的趋同或限制复发矩阵。然而,这些制约因素对RNN的表达性造成了严重限制。基本内在的动态,如多重性或混乱,已被禁用。这与许多(如果不是大部分的话)自然和社会上遇到的时间序列的混乱性质有矛盾。我们在这里通过将RNNE培训期间的损失梯度与RNNN的Lyapunov轨道频谱联系起来,或者更近一些限制(等)复发式矩阵。我们数学地证明,产生稳定平衡或周期行为的RNNNWs的梯度梯度梯度是封闭的梯度梯度,而选择多或混乱性等基本内在动态或混乱性动态等基本动态的梯度的梯度则总是提供这些相关的精确度分析,从而提供这些高度分析。