We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular, our findings point to the existence of a diffusive regime in which the deep network limit is described by a class of stochastic differential equations (SDEs). Finally, we derive the corresponding scaling limits for the backpropagation dynamics.
翻译:随着层数的增加,我们调查了深残余网络(ResNets)的无症状特性。我们首先发现,对受过训练的重量,存在着与神经数据交换文献中隐含假设的明显不同的比例化制度。我们研究了这些比例化制度中隐藏的状态动态的趋同情况,表明人们可能获得一个 ODE、一个随机差分方程式(SDE)或其中的任何一种。特别是,我们的调查结果表明,存在一种分解制度,在这种制度中,由一类随机差分方程式(SDEs)来描述深网络的界限。最后,我们得出了反反向反偏差动力的相应比例限制。