Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative side, does not go to 0 with depth N if the residual functions are not smooth with depth. On the positive side, we show that this smoothness is preserved by gradient descent for a ResNet with linear residual functions and small enough initial loss. It ensures an implicit regularization towards a limit Neural ODE at rate 1 over N, uniformly with depth and optimization time. As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input. We then show that Heun's method, a second order ODE integration scheme, allows for better gradient estimation with the adjoint method when the residual functions are smooth with depth. We experimentally validate that our adjoint method succeeds at large depth, and that Heun method needs fewer layers to succeed. We finally use the adjoint method successfully for fine-tuning very deep ResNets without memory consumption in the residual layers.
翻译:普通神经等同( Neal CODEs) 是残余神经网络(ResNets) 的连续类比。 我们调查ResNet 定义的离散动态是否接近神经运行的连续动态。 我们首先量化ResNet的隐藏状态轨迹与相应的神经运行的解决方案之间的距离。 我们的绑定很紧, 而在负面上, 如果残余功能不能以深度平滑, 则不会在深度为0, 则在深度为0 。 在积极的一面, 我们显示, 这种光滑状态是由具有线性残余功能的 ResNet (ResNet) 保存的梯度下降而保持的, 初始损失也足够小。 它确保了内隐性地规范, 将神经运行值限制为N1, 与深度和优化时间一致。 作为我们分析的一个副产品, 我们考虑使用一个没有记忆的离散连接连接方法来训练ResNet, 通过网络的后退通道恢复飞行上的启动功能, 并显示这种方法理论上在深度较深的深度, 如果残余功能与输入的 Repschitz 。 我们然后显示, 在深度的精度上显示, 智能的深度, 我们的深度将使用一个方法, 我们的模型的升级的方法, 将使得这个方法最终的升级的升级的方法可以使这个方法 升级的升级的升级的升级的升级的升级化方法能够 。