Finding a quantitative theory of neural network generalization has long been a central goal of deep learning research. We extend recent results to demonstrate that, by examining the eigensystem of a neural network's "neural tangent kernel", one can predict its generalization performance when learning arbitrary functions. Our theory accurately predicts not only test mean-squared-error but all first- and second-order statistics of the network's learned function. Furthermore, using a measure quantifying the "learnability" of a given target function, we prove a new "no-free-lunch" theorem characterizing a fundamental tradeoff in the inductive bias of wide neural networks: improving a network's generalization for a given target function must worsen its generalization for orthogonal functions. We further demonstrate the utility of our theory by analytically predicting two surprising phenomena - worse-than-chance generalization on hard-to-learn functions and nonmonotonic error curves in the small data regime - which we subsequently observe in experiments. Though our theory is derived for infinite-width architectures, we find it agrees with networks as narrow as width 20, suggesting it is predictive of generalization in practical neural networks. Code replicating our results is available at https://github.com/james-simon/eigenlearning .
翻译:长期以来,寻找神经网络概括化的定量理论一直是深层学习研究的核心目标。我们扩展了最近的结果,以证明通过研究神经网络“神经相近内核”的隐性系统,人们可以在学习任意功能时预测其概括性表现。我们的理论准确地预测了不仅测试中度偏差的神经网络普遍化,而且网络所学功能的所有第一和第二级统计。此外,我们使用量化特定目标功能的“可忽略性”的尺度,证明了一种新的“无无结结”理论,这是在宽广神经网络的缩进偏向性偏差中的基本交易:改进一个网络对特定目标功能的概括化必须恶化其概括性功能。我们进一步通过分析预测两种令人惊讶的现象 — — 硬到learn函数的比一般化更差,以及小数据系统中的非运动错误曲线 — — 我们随后在实验中观察了它。尽管我们的理论是用于无限/虚拟神经化网络的推导结果,但我们在常规/网络的深度中也同意它作为一般的预测结果。