Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental tradeoff between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms. Finally, using results from random matrix theory, we reveal the information complexity of learning a linear map in high dimensions and unveil information-theoretic analogs of double and multiple descent phenomena.
翻译:避免超装是机器学习的一个中心挑战,但许多大型神经网络很容易实现零培训损失。这种令人费解的矛盾要求以新的方法来研究超装问题。 我们在这里通过残余信息量化超装问题, 被定义为在培训数据中编码噪音的适合模型中的比特。 信息高效的学习算法最大限度地减少残余信息, 同时又最大限度地扩大相关比特, 这些比特是未知的基因化模型的预测。 我们解决了这一优化, 以获得线性回归问题最佳算法的信息内容, 并将其与随机的脊脊回归进行比较。 我们的结果显示了剩余信息与相关信息之间的基本平衡, 并说明了随机回归在优化算法方面的相对信息效率。 最后, 我们利用随机矩阵理论的结果, 揭示了在高维度上学习线性地图的信息复杂性, 并展示了双、多重血统现象的信息理论模拟。