The theory of bias-variance used to serve as a guide for model selection when applying Machine Learning algorithms. However, modern practice has shown success with over-parameterized models that were expected to overfit but did not. This led to the proposal of the double descent curve of performance by Belkin et al. Although it seems to describe a real, representative phenomenon, the field is lacking a fundamental theoretical understanding of what is happening, what are the consequences for model selection and when is double descent expected to occur. In this paper we develop a principled understanding of the phenomenon, and sketch answers to these important questions. Furthermore, we report real experimental results that are correctly predicted by our proposed hypothesis.
翻译:在应用机器学习算法时,偏向偏差理论被用来作为模式选择的指南,然而,现代实践已经证明,超分参数模型的成功,这些模型预期会超标,但不会超标。这导致Belkin等人提出了业绩双向下降曲线的建议。 虽然它似乎描述了一种真实的、有代表性的现象,但这一领域缺乏对正在发生的情况、模式选择的后果以及预期会出现双向下降的情况的基本理论理解。在本文件中,我们形成了对这一现象的原则性理解,并勾画了对这些重要问题的答案。此外,我们报告的是真实的实验结果,我们提出的假设正确地预测了这些结果。