The generalization capacity of various machine learning models exhibits different phenomena in the under- and over-parameterized regimes. In this paper, we focus on regression models such as feature regression and kernel regression and analyze a generalized weighted least-squares optimization method for computational learning and inversion with noisy data. The highlight of the proposed framework is that we allow weighting in both the parameter space and the data space. The weighting scheme encodes both a priori knowledge on the object to be learned and a strategy to weight the contribution of different data points in the loss function. Here, we characterize the impact of the weighting scheme on the generalization error of the learning method, where we derive explicit generalization errors for the random Fourier feature model in both the under- and over-parameterized regimes. For more general feature maps, error bounds are provided based on the singular values of the feature matrix. We demonstrate that appropriate weighting from prior knowledge can improve the generalization capability of the learned model.
翻译:各种机器学习模型的普遍化能力在低度和超度参数体系中显示出不同现象。 在本文中,我们侧重于地貌回归和内核回归等回归模型,并分析了用于计算学习的通用加权最小平方优化法和与噪音数据反射的通用最小平方优化法。拟议框架的亮点是,我们允许在参数空间和数据空间中加权。加权方案将先验知识纳入要学习的物体,并将不同数据点在损失函数中的贡献的加权战略纳入编码。在这里,我们描述了加权方案对学习方法一般化错误的影响,我们从中得出在低度和超度参数体系中随机的四面特征模型的明确一般化错误。对于更一般性的特征图,则根据特征矩阵的单值提供错误界限。我们证明,从先前知识中适当加权可以提高所学模型的通用能力。