We analyze multi-layer neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously establish the limiting behavior of the multi-layer neural network output. The limit procedure is valid for any number of hidden layers and it naturally also describes the limiting behavior of the training loss. The ideas that we explore are to (a) take the limits of each hidden layer sequentially and (b) characterize the evolution of parameters in terms of their initialization. The limit satisfies a system of deterministic integro-differential equations. The proof uses methods from weak convergence and stochastic analysis. We show that, under suitable assumptions on the activation functions and the behavior for large times, the limit neural network recovers a global minimum (with zero loss for the objective function).
翻译:我们同时分析(A)大网络大小和(B)大量随机梯度梯度下降培训迭代的无现成体系中的多层神经网络。我们严格确定多层神经网络输出的有限行为。限制程序对任何数个隐藏层都是有效的,它自然也描述了培训损失的有限行为。我们探讨的想法是:(a) 依次选择每个隐藏层的极限,(b) 从其初始化角度描述参数的演变。限制满足了确定性内分层差异方程式的系统。证据使用薄弱趋同和随机分析的方法。我们表明,在对激活功能和大型时间行为的适当假设下,限制神经网络将恢复一个全球最低值(客观功能将零损失 ) 。