Data-driven machine learning models are being increasingly employed in several important inference problems in biology, chemistry, and physics which require learning over combinatorial spaces. Recent empirical evidence (see, e.g., [1], [2], [3]) suggests that regularizing the spectral representation of such models improves their generalization power when labeled data is scarce. However, despite these empirical studies, the theoretical underpinning of when and how spectral regularization enables improved generalization is poorly understood. In this paper, we focus on learning pseudo-Boolean functions and demonstrate that regularizing the empirical mean squared error by the L_1 norm of the spectral transform of the learned function reshapes the loss landscape and allows for data-frugal learning, under a restricted secant condition on the learner's empirical error measured against the ground truth function. Under a weaker quadratic growth condition, we show that stationary points which also approximately interpolate the training data points achieve statistically optimal generalization performance. Complementing our theory, we empirically demonstrate that running gradient descent on the regularized loss results in a better generalization performance compared to baseline algorithms in several data-scarce real-world problems.
翻译:数据驱动的机器学习模型越来越多地被用于生物学、化学和物理学方面的一些重要推论问题,这些问题需要在组合空间中学习。最近的实证证据(例如,见[1,[2],[3])表明,在标签数据缺乏的情况下,将这类模型的光谱代表形式正规化提高了其一般化能力;然而,尽管有这些经验研究,对何时和如何使光谱正规化能够改进一般化的理论基础了解甚少。在本文中,我们侧重于学习假的博爱功能,并表明,根据学习功能光谱转换的L_1规范,将经验性平均正方形错误正规化,将损失面貌重塑,允许在学习者与地面真理功能相比的实验错误有一定的偏差条件下进行数据流学学习。在较弱的四边增长条件下,我们表明,将培训数据点相近于统计性最佳一般化性表现的固定点。在我们的理论中,我们从经验中表明,与一些数据真实世界的基线算法相比,在更精确的概括性损失上正在发生渐渐渐渐下降的结果。