Neural Ordinary Differential Equations (NODEs), a framework of continuous-depth neural networks, have been widely applied, showing exceptional efficacy in coping with some representative datasets. Recently, an augmented framework has been successfully developed for conquering some limitations emergent in application of the original framework. Here we propose a new class of continuous-depth neural networks with delay, named as Neural Delay Differential Equations (NDDEs), and, for computing the corresponding gradients, we use the adjoint sensitivity method to obtain the delayed dynamics of the adjoint. Since the differential equations with delays are usually seen as dynamical systems of infinite dimension possessing more fruitful dynamics, the NDDEs, compared to the NODEs, own a stronger capacity of nonlinear representations. Indeed, we analytically validate that the NDDEs are of universal approximators, and further articulate an extension of the NDDEs, where the initial function of the NDDEs is supposed to satisfy ODEs. More importantly, we use several illustrative examples to demonstrate the outstanding capacities of the NDDEs and the NDDEs with ODEs' initial value. Specifically, (1) we successfully model the delayed dynamics where the trajectories in the lower-dimensional phase space could be mutually intersected, while the traditional NODEs without any argumentation are not directly applicable for such modeling, and (2) we achieve lower loss and higher accuracy not only for the data produced synthetically by complex models but also for the real-world image datasets, i.e., CIFAR10, MNIST, and SVHN. Our results on the NDDEs reveal that appropriately articulating the elements of dynamical systems into the network design is truly beneficial to promoting the network performance.
翻译:连续深度神经神经网络框架,即连续深度神经网络框架,已被广泛应用,这表明在应对一些具有代表性的数据集方面具有超乎寻常的功效。最近,为克服在应用原始框架时出现的某些限制,已经成功地开发了一个强化框架。在这里,我们提议了一个新的连续深度神经网络类别,称为神经延迟差异(NDDEs),在计算相应的梯度时,我们使用连接灵敏度方法,以获得相联性机的延迟动态动态。由于延迟式差异方程式通常被视为具有较富成果的动态的无限维度系统,与NODEs相比,已经开发出一个较强的非线性表示能力。事实上,我们分析地证实,NDDEs是通用的相吸附器,并进一步阐述了NDDEs的延伸,其中NDDEs最初功能只能满足代码中的延迟动态。更重要的是,我们用几个示例展示了NDEserveyal 的超强性能, 并用不甚低度的模型和低度数据网络来显示我们不甚低度的模拟和低度数据动态,而我们则以直观显示。