Wide neural networks have proven to be a rich class of architectures for both theory and practice. Motivated by the observation that finite width convolutional networks appear to outperform infinite width networks, we study scaling laws for wide CNNs and networks with skip connections. Following the approach of (Dyer & Gur-Ari, 2019), we present a simple diagrammatic recipe to derive the asymptotic width dependence for many quantities of interest. These scaling relationships provide a solvable description for the training dynamics of wide convolutional networks. We test these relations across a broad range of architectures. In particular, we find that the difference in performance between finite and infinite width models vanishes at a definite rate with respect to model width. Nonetheless, this relation is consistent with finite width models generalizing either better or worse than their infinite width counterparts, and we provide examples where the relative performance depends on the optimization details.
翻译:广度神经网络已证明是理论和实践的丰富结构。 以有限宽度进化网络似乎优于无限宽度网络的观察为动力,我们研究宽度CNN和有跳过连接的网络的定级法。 按照(Dyer & Gur-Ari, 2019年)的方法,我们提出了一个简单的图表配方,以得出许多兴趣程度的无线宽度依赖性。这些缩放关系为广度革命网络的培训动态提供了一个可以解析的描述。我们测试了这些关系。我们测试了范围很广的建筑。特别是,我们发现有限和无限宽度模型的性能差异在模型宽度方面以明确的速度消失。然而,这种关系与有限宽度模型的模型一致,比其宽度的宽度一般或更好,甚至更差,我们提供了相对性能取决于优化细节的例子。