Gateting 在GRUs和LSTMs中制造慢模式和控制相位空间复杂性 (Gating creates slow modes and controls phase-space complexity in GRUs and LSTMs)

Recurrent neural networks (RNNs) are powerful dynamical models for data with complex temporal structure. However, training RNNs has traditionally proved challenging due to exploding or vanishing of gradients. RNN models such as LSTMs and GRUs (and their variants) significantly mitigate the issues associated with training RNNs by introducing various types of {\it gating} units into the architecture. While these gates empirically improve performance, how the addition of gates influences the dynamics and trainability of GRUs and LSTMs is not well understood. Here, we take the perspective of studying randomly-initialized LSTMs and GRUs as dynamical systems, and ask how the salient dynamical properties are shaped by the gates. We leverage tools from random matrix theory and mean-field theory to study the state-to-state Jacobians of GRUs and LSTMs. We show that the update gate in the GRU and the forget gate in the LSTM can lead to an accumulation of slow modes in the dynamics. Moreover, the GRU update gate can poise the system at a marginally stable point. The reset gate in the GRU and the output and input gates in the LSTM control the spectral radius of the Jacobian, and the GRU reset gate also modulates the complexity of the landscape of fixed-points. Furthermore, for the GRU we obtain a phase diagram describing the statistical properties of fixed-points. Finally, we provide some preliminary comparison of training performance to the various dynamical regimes, which will be investigated elsewhere. The techniques introduced here can be generalized to other RNN architectures to elucidate how various architectural choices influence the dynamics and potentially discover novel architectures.

翻译：经常性神经网络(RNN)是具有复杂时间结构的强大动态数据模型。然而,培训RNNS历来证明由于梯度的爆炸或消失而具有挑战性。LSTMS和GRUs(及其变体)等RNNS模型通过在结构中引入各种类型的“星格”单位,大大缓解了与培训RNS有关的问题。虽然这些门户在经验上提高了性能,但增加大门如何影响GRUs和LSTMs的动态和可训练性却不为人所熟知。在这里,我们从随机初始的LSTMS和GRRUs作为动态系统来研究随机初始的LSTMs和GRUs的动态。我们从随机矩阵理论和平均理论中利用工具来研究RNNNNS的状态-状态和状态的雅各单元。我们显示GRU和LSTMs的遗忘大门能够导致动态模式的累积。此外,GRUMS的更新门可以将系统调整到稍微稳定点的系统。我们GRRRMS的深度结构结构结构的重新定位和输出结构。