线性倒退中自由程度的预测模型模型 (Predictive Model Degrees of Freedom in Linear Regression)

Overparametrized interpolating models have drawn increasing attention from machine learning. Some recent studies suggest that regularized interpolating models can generalize well. This phenomenon seemingly contradicts the conventional wisdom that interpolation tends to overfit the data and performs poorly on test data. Further, it appears to defy the bias-variance trade-off. As one of the shortcomings of the existing theory, the classical notion of model degrees of freedom fails to explain the intrinsic difference among the interpolating models since it focuses on estimation of in-sample prediction error. This motivates an alternative measure of model complexity which can differentiate those interpolating models and take different test points into account. In particular, we propose a measure with a proper adjustment based on the squared covariance between the predictions and observations. Our analysis with least squares method reveals some interesting properties of the measure, which can reconcile the "double descent" phenomenon with the classical theory. This opens doors to an extended definition of model degrees of freedom in modern predictive settings.

翻译：过度平衡的内插模型从机器学习中引起越来越多的注意。最近的一些研究表明,正规化的内插模型可以很好地加以概括。这种现象似乎与内插往往过分适应数据、对测试数据表现不佳的传统智慧相矛盾。此外,它似乎与偏差权衡不相容。作为现有理论的缺点之一,模型自由度的经典概念未能解释内插模型之间的内在差异,因为它侧重于估算模拟预测误差。这引发了一种模型复杂性的替代计量方法,可以区分这些内插模型,并将不同的测试点考虑在内。特别是,我们根据预测和观察之间的平方变量提出一个适当调整的措施。我们用最小方位方法的分析揭示了该计量的一些有趣的特性,这可以调和“双重血统”现象和古典理论。这为在现代预测环境中对模型自由度的扩大定义打开了大门。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/