Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian hierarchical targets) that incorporate a hierarchy of latent subspace dimensionalities. This framework enables us to analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit. Specifically, our main theorem shows that feature learning with GD successively reduces the effective dimensionality, transforming a high-dimensional problem into a sequence of lower-dimensional ones. This enables learning the target function with drastically less samples than with shallow networks. While the results are proven in a controlled training setting, we also discuss more common training procedures and argue that they learn through the same mechanisms.
翻译:理解由梯度下降(GD)训练的深度神经网络相较于浅层模型的优势,仍是一个开放的理论挑战。本文引入一类目标函数(单索引与多索引高斯层次目标),其融合了潜在子空间维度的层次结构。该框架使我们能够在高维极限下,解析地研究深度网络相较于浅层网络的学习动态与泛化性能。具体而言,我们的主要定理表明,通过梯度下降的特征学习会逐步降低有效维度,将高维问题转化为一系列低维问题。这使得学习目标函数所需的样本量远少于浅层网络。虽然结果是在受控训练环境中证明的,我们也讨论了更常见的训练过程,并论证它们通过相同的机制进行学习。