We present a unified approach to obtain scaling limits of neural networks using the genus expansion technique from random matrix theory. This approach begins with a novel expansion of neural networks which is reminiscent of Butcher series for ODEs, and is obtained through a generalisation of Fa\`a di Bruno's formula to an arbitrary number of compositions. In this expansion, the role of monomials is played by random multilinear maps indexed by directed graphs whose edges correspond to random matrices, which we call operator graphs. This expansion linearises the effect of the activation functions, allowing for the direct application of Wick's principle to compute the expectation of each of its terms. We then determine the leading contribution to each term by embedding the corresponding graphs onto surfaces, and computing their Euler characteristic. Furthermore, by developing a correspondence between analytic and graphical operations, we obtain similar graph expansions for the neural tangent kernel as well as the input-output Jacobian of the original neural network, and derive their infinite-width limits with relative ease. Notably, we find explicit formulae for the moments of the limiting singular value distribution of the Jacobian. We then show that all of these results hold for networks with more general weights, such as general matrices with i.i.d. entries satisfying moment assumptions, complex matrices and sparse matrices.
翻译:暂无翻译