We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large. We derive a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We contrast these Bayes-optimal errors with the test errors of ridge regression, kernel and random features regression. We find, in particular, that optimally regularized ridge regression, as well as kernel regression, achieve Bayes-optimal performances, while the logistic loss yields a near-optimal test error for classification. We further show numerically that when the number of samples grows faster than the dimension, ridge and kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples.
翻译:我们考虑的是学习一个目标函数的问题,该目标函数与一个带有随机高斯重量的深宽宽宽度非线性神经网络相对应。我们考虑的是样品数量、输入尺寸和网络宽度成比例大小的无线性限。我们为Bayes-最佳试验错误、回归和分类任务得出一个封闭式的表达方式。我们将这些贝亚-最佳错误与脊回归、内核和随机特征回归的试验错误加以对比。我们特别发现,最优化的正规脊回归以及内核回归达到巴亚-最佳性能,而物流损失则产生近于最佳的分类测试错误。我们进一步用数字显示,当样品数量增长快于尺寸、脊脊和内核方法时,就会变得不理想,而神经网络的测试错误则从四重式的多个样本中接近零。