In a neural network (NN), *weight matrices* linearly transform inputs into *preactivations* that are then transformed nonlinearly into *activations*. A typical NN interleaves multitudes of such linear and nonlinear transforms to express complex functions. Thus, the (pre-)activations depend on the weights in an intricate manner. We show that, surprisingly, (pre-)activations of a randomly initialized NN become *independent* from the weights as the NN's widths tend to infinity, in the sense of asymptotic freeness in random matrix theory. We call this the Free Independence Principle (FIP), which has these consequences: 1) It rigorously justifies the calculation of asymptotic Jacobian singular value distribution of an NN in Pennington et al. [36,37], essential for training ultra-deep NNs [48]. 2) It gives a new justification of gradient independence assumption used for calculating the Neural Tangent Kernel of a neural network. FIP and these results hold for any neural architecture. We show FIP by proving a Master Theorem for any Tensor Program, as introduced in Yang [50,51], generalizing the Master Theorems proved in those works. As warmup demonstrations of this new Master Theorem, we give new proofs of the semicircle and Marchenko-Pastur laws, which benchmarks our framework against these fundamental mathematical results.
翻译:在神经网络(NN)中,*重量矩阵* 线性矩阵将输入转换成* 状态,然后从非线性自由理论的意义上,将输入转换成* 状态* 。典型的 NN 间间隔将许多线性和非线性变异变异为表示复杂的功能。因此,(预)激活取决于权重的复杂方式。我们表明,令人惊讶的是,随机初始化的NNN的(预)激活成为* 依赖* 重力,因为NN的宽度趋向于无限,从随机矩阵理论的无线自由感来看。我们称之为自由独立原则(FIP),其后果如下:1) 它严格地说明计算NN在Pennington 和 al. [36,37] 的无线性Jacobian单值分布取决于权重的复杂方式。我们显示,对培训超深NNN(P) [48] 至关重要的(预) 。2 它提供了一个新的梯度独立假设的理由,用于计算神经网络的内核内核内衬。FIP和这些结果在任何基本矩阵结构结构中可以证明。