Modern neural networks (NN) featuring a large number of layers (depth) and units per layer (width) have achieved a remarkable performance across many domains. While there exists a vast literature on the interplay between infinitely wide NNs and Gaussian processes, a little is known about analogous interplays with respect to infinitely deep NNs. NNs with independent and identically distributed (i.i.d.) initializations exhibit undesirable forward and backward propagation properties as the number of layers increases. To overcome these drawbacks, Peluchetti and Favaro (2020) considered fully-connected residual networks (ResNets) with network's parameters initialized by means of distributions that shrink as the number of layers increases, thus establishing an interplay between infinitely deep ResNets and solutions to stochastic differential equations, i.e. diffusion processes, and showing that infinitely deep ResNets does not suffer from undesirable forward-propagation properties. In this paper, we review the results of Peluchetti and Favaro (2020), extending them to convolutional ResNets, and we establish analogous backward-propagation results, which directly relate to the problem of training fully-connected deep ResNets. Then, we investigate the more general setting of doubly infinite NNs, where both network's width and network's depth grow unboundedly. We focus on doubly infinite fully-connected ResNets, for which we consider i.i.d. initializations. Under this setting, we show that the dynamics of quantities of interest converge, at initialization, to deterministic limits. This allow us to provide analytical expressions for inference, both in the case of weakly trained and fully trained ResNets. Our results highlight a limited expressive power of doubly infinite ResNets when the unscaled network's parameters are i.i.d. and the residual blocks are shallow.
翻译:现代神经网络( NN) 具有大量层层( 深度) 和每层( 宽度) 的单位, 在许多领域取得了显著的成绩。 虽然关于无限宽的 NNS 和 Gaussian 进程之间相互作用的大量文献, 但关于无限深度的 NNS 的类似相互作用, 也鲜为人知( 例如, d. ) 独立和同样分布的 NNS 。 初始化显示出不受欢迎的前向和后向传播特性。 为了克服这些缺陷, Peluchetti 和 Favaro ( 202020) 认为完全连接的剩余网络( ResNets), 网络的参数是完全连接的, 随着层数的增加, 发行量会减少, 从而建立无限深度的 ResNet 和解决方案的解决方案之间的相互作用, 也就是 传播过程, 显示远深层的ResNet 和Favaro ( 2020) 的特性。 在本文中, 我们审查它们的结果是非革命性的初始 ResNets, 和我们彻底的 建立 的 直线性网络 。