The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-in-depth functions using linear combinations of basis functions which enables us to leverage parameter transformations such as function projections. In turn, this view allows us to formulate a novel stateful ODE-Block that handles stateful layers. The benefits of this new ODE-Block are twofold: first, it enables incorporating meaningful continuous-in-depth batch normalization layers to achieve state-of-the-art performance; second, it enables compressing the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance and reducing both inference time and memory footprint. Performance is demonstrated by applying our stateful ODE-Block to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.
翻译:最近推出的普通差异方程式类别(ODE-Nets)在深层次学习和动态系统之间建立了富有成果的联系。 在这项工作中,我们重新考虑了作为连续深入功能的加权配方,使用基础功能的线性组合进行连续深入功能,从而使我们能够利用参数转换,如函数预测。反过来,这种观点又使我们能够开发出一个新的状态式的ODE-Block,处理状态式的层。这个新的ODE-Block的好处是双重的:首先,它能够将有意义的连续深入的批次正常化层纳入其中,以实现最先进的性能;其次,它能够通过基础的改变来压缩重量,而不进行再培训,同时保持接近最先进的性能,并减少时间和记忆足迹的推力。通过应用我们状态式的ODE-Block来(a) 使用变压器进行图像分类任务,以及(b) 使用变压器编码器编码装置进行句述任务,来显示绩效。