We study the approximation properties and optimization dynamics of recurrent neural networks (RNNs) when applied to learn input-output relationships in temporal data. We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships. Mathematically, the latter can be understood as a sequence of linear functionals. We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory. Moreover, we perform a fine-grained dynamical analysis of training linear RNNs, which further reveal the intricate interactions between memory and learning. A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework, on approximation and optimization: when there is long term memory in the target, it takes a large number of neurons to approximate it. Moreover, the training process will suffer from slow downs. In particular, both of these effects become exponentially more pronounced with memory - a phenomenon we call the "curse of memory". These analyses represent a basic step towards a concrete mathematical understanding of new phenomenon that may arise in learning temporal relationships using recurrent architectures.
翻译:我们研究经常神经网络(RNN)的近似属性和优化动态,以在时间数据中学习输入-输出关系。我们考虑使用连续时间线性线性网络(RNN)来从线性关系产生的数据中学习的简单但有代表性的设置。从数学上看,后者可以被理解为线性功能的序列。我们证明这种线性功能的全近似近似性理论,并描述近似率及其与记忆的关系。此外,我们对培训线性网络(RNN)进行了细微的动态分析,进一步揭示了记忆和学习之间的复杂互动。我们发现的一个统一主题是记忆的非边际效应,在近似和优化的框架中可以精确地说明这一概念:当目标中存在长期记忆时,它需要大量的神经来接近它。此外,培训过程将受到缓慢的困扰。特别是,这两种影响随着记忆而变得指数化更加明显——我们称之为“记忆的诅咒”现象。这些分析代表了对新现象的具体数学理解的基本步骤。