In this work, we investigate the asymptotic spectral density of the random feature matrix $M = Y Y^\ast$ with $Y = f(WX)$ generated by a single-hidden-layer neural network, where $W$ and $X$ are random rectangular matrices with i.i.d. centred entries and $f$ is a non-linear smooth function which is applied entry-wise. We prove that the Stieltjes transform of the limiting spectral distribution approximately satisfies a quartic self-consistent equation, which is exactly the equation obtained by [Pennington, Worah] and [Benigni, P\'ech\'e] with the moment method. We extend the previous results to the case of additive bias $Y=f(WX+B)$ with $B$ being an independent rank-one Gaussian random matrix, closer modelling the neural network infrastructures encountered in practice. Our key finding is that in the case of additive bias it is impossible to choose an activation function preserving the layer-to-layer singular value distribution, in sharp contrast to the bias-free case where a simple integral constraint is sufficient to achieve isospectrality. To obtain the asymptotics for the empirical spectral density we follow the resolvent method from random matrix theory via the cumulant expansion. We find that this approach is more robust and less combinatorial than the moment method and expect that it will apply also for models where the combinatorics of the former become intractable. The resolvent method has been widely employed, but compared to previous works, it is applied here to non-linear random matrices.
翻译:在这项工作中,我们调查了随机特征矩阵的无线光谱密度 $M = Y Y = Y Y ast$ = Y Y = Y Y = Y Y ast$ = f (WX) $ 由单隐藏层神经网络生成,W$ 和 $X$ 是随机的矩形矩阵, 带有 i. id. 中心条目 和 $f$ 是一个非线性平滑功能, 应用进取。 我们证明, Stieltjes 转换了限制光谱分布, 大致满足了一个四分层的自相异方程式, 这正是由[Pennington, Worrah] 和 [Benigni, P\'ech\'e] = f f (F) y(WX+B) 生成的公式, 美元和 美元是随机矩形矩阵, 也就是我们通过直观的直径直径直径直的直径直径直方向, 和直径直的直径直径直径直径直径直径直的直路路路路路路路路路路路路路路路路路。