The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A simple way to circumvent this issue is to use reversible architectures. In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (MomentumNets), are invertible. Unlike previous invertible architectures, they can be used as a drop-in replacement for any existing ResNet block. We show that MomentumNets can be interpreted in the infinitesimal step size regime as second-order ordinary differential equations (ODEs) and exactly characterize how adding momentum progressively increases the representation capabilities of MomentumNets. Our analysis reveals that MomentumNets can learn any linear mapping up to a multiplicative factor, while ResNets cannot. In a learning to optimize setting, where convergence to a fixed point is required, we show theoretically and empirically that our method succeeds while existing invertible architectures fail. We show on CIFAR and ImageNet that MomentumNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained MomentumNets are promising for fine-tuning models.
翻译:深残余神经网络(ResNets)的后向反向分析培训的记忆成本随着网络深度的深度而直线增长。 绕过这一问题的一个简单方法就是使用可逆结构。 在本文中, 我们提议通过增加一个动因词来改变ResNet的前瞻性规则。 由此形成的网络、 动力残留神经网络( MomentumNets) 无法倒置。 与以往不可逆的结构不同, 它们可以用作任何现有的ResNet块的下降替代物。 我们显示, MomentumNet可以在无限的步进规模制度中被解释为二阶普通差异方程式( ODEs), 并准确地描述如何增加动力, 逐步提高 MomentumNet 的代表能力。 我们的分析显示, MomentumNet 可以学习任何直线性绘图, 达到多复制性因素, 而ResNets则无法。 在学习优化设置时, 需要与固定点相融合, 我们从理论上和实验上显示, 我们的方法在现有的可倒置结构失败时会成功。 我们在 CIFAR 和图像网络上展示, 我们展示了更小的模型和图像网络是移动式的模型,, 并且显示, 模型显示, 最有预的缩的缩的缩缩缩缩的记忆网络, 。