Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, ODEs have also been used to derive theoretical results for NN learning rules, e.g., the famous connection between Oja's rule and principal component analysis. Such rules are typically expressed as additive iterative update processes which have straightforward ODE counterparts. Here we introduce a novel combination of learning rules and Neural ODEs to build continuous-time sequence processing nets that learn to manipulate short-term memory in rapidly changing synaptic connections of other nets. This yields continuous-time counterparts of Fast Weight Programmers and linear Transformers. Our novel models outperform the best existing Neural Controlled Differential Equation based models on various time series classification tasks, while also addressing their fundamental scalability limitations. Our code is public.
翻译:神经普通差异方程式(ODEs)吸引了人们的极大关注,因为深残余神经网络的连续时间对应方(NNs)和经常NNs的多次扩展已经提出。自1980年代以来,也利用ODEs为NN学习规则获得理论结果,例如著名的Oja规则与主要组成部分分析之间的联系。这些规则通常表现为添加迭代更新程序,具有直接的ODE对应程序。在这里,我们引入了新颖的学习规则和Neural 代码组合,以建立连续时间序列处理网,在迅速变化的其他网络的合成连接中学会操纵短期记忆。这产生了快速光速程序员和线性变异器的连续时间对应方。我们的新模式超越了基于不同时间序列分类任务的现有最佳神经控制差异模型,同时解决其基本的可缩放性限制。我们的代码是公开的。