This manuscript considers the problem of learning a random Gaussian network function using a fully connected network with frozen intermediate layers and trainable readout layer. This problem can be seen as a natural generalization of the widely studied random features model to deeper architectures. First, we prove Gaussian universality of the test error in a ridge regression setting where the learner and target networks share the same intermediate layers, and provide a sharp asymptotic formula for it. Establishing this result requires proving a deterministic equivalent for traces of the deep random features sample covariance matrices which can be of independent interest. Second, we conjecture the asymptotic Gaussian universality of the test error in the more general setting of arbitrary convex losses and generic learner/target architectures. We provide extensive numerical evidence for this conjecture, which requires the derivation of closed-form expressions for the layer-wise post-activation population covariances. In light of our results, we investigate the interplay between architecture design and implicit regularization.
翻译:本手稿考虑了使用与冻结中间层和可训练读取层完全连接的网络网络来学习随机高斯网络函数的问题。 这个问题可以被视为对广泛研究的随机特征模型的自然概括, 更深的建筑结构。 首先, 我们证明在山脊回归环境中测试错误的普遍性, 学习者和目标网络在其中共享相同的中间层, 并为它提供一个尖锐的零食公式。 建立这一结果需要证明一个确定性等同的深随机特征样本变异矩阵的痕迹, 这可能是独立感兴趣的。 其次, 我们推测在任意的等同损失和通用学习者/ 目标结构的更一般性设置中测试错误的无症状高斯普遍性。 我们为这种预测提供了广泛的数字证据, 这需要从结构设计与隐含的正规化之间产生闭式表达。 我们根据我们的结果, 调查结构设计与隐含的正规化之间的相互作用。