Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.
翻译:理解数据结构对学习的计算可感性的影响是神经网络理论的关键挑战。许多理论工作并未明确示范培训数据,或假设输入是独立于某些简单概率分布的成分。在这里,我们超越了这一简单范例,通过研究根据预先培训的基因模型数据培训的神经网络的性能,这是可能的,因为高斯当量表明,诸如培训和测试错误等关键利益指标可以通过一个适当选择的高斯模型充分捕捉。我们提供了三层严格、分析和数字证据来证实这种等同。首先,我们为高斯等同在单层基因化模型中保持严格的条件,以及分配中一致的确定率。第二,我们利用这一等量来得出一套封闭的方程式,描述两个广泛研究的机器学习问题的总体性表现:两层神经网络,用一种直截式梯度梯度梯度梯度梯度梯度梯度下降,以及完全匹配的学前或内核方法来证实这种等同。最后,我们为高斯等同性等同设置了严格的条件,以便在单层基因模型中保持这种等同状态,我们用一种现实的理论实验性模型来试验。我们如何深入地学习基因模型。