The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems. In this work, we reconsider formulations of the weights as continuous-depth functions using linear combinations of basis functions. This perspective allows us to compress the weights through a change of basis, without retraining, while maintaining near state-of-the-art performance. In turn, both inference time and the memory footprint are reduced, enabling quick and rigorous adaptation between computational environments. Furthermore, our framework enables meaningful continuous-in-time batch normalization layers using function projections. The performance of basis function compression is demonstrated by applying continuous-depth models to (a) image classification tasks using convolutional units and (b) sentence-tagging tasks using transformer encoder units.
翻译:最近推出的普通差分方程网络(ODE-Nets)类别在深层学习和动态系统之间建立了富有成果的联系。 在这项工作中,我们重新考虑了加权的公式,将之作为使用基础功能线性组合的连续深入功能。这一视角使我们能够通过基础变化压缩加权,而无需再培训,同时保持接近最新水平的性能。反过来,推论时间和记忆足迹都减少了,使得计算环境之间能够迅速和严格地适应。此外,我们的框架利用功能预测,使得有意义的连续分批正常层次得以实现。基础功能压缩的性能表现表现表现表现为采用连续深入模型,以便(a) 利用组合单元进行图像分类,和(b) 使用变压器编码器装置进行记录刑期的工作。