One of the major open problems in machine learning theory is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent. In many scenarios, their failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. To address this shortcoming, we propose a concept named compatibility, which quantitatively characterizes generalization in a both data-relevant and algorithm-relevant manner. By considering the entire training trajectory and focusing on early-stopping iterates, compatibility fully exploits the algorithm information and therefore yields better generalization guarantees. We validate this by theoretically studying compatibility under the setting of overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility under such a setting. Our theoretical results show that in the sense of compatibility, generalization holds with significantly weaker restrictions on the problem instance than the previous last iterate analysis.
翻译:机器学习理论中一个主要的公开问题就是将过度参数化制度中的笼统化定性为典型化,因为大多数传统的笼统化界限都变得不一致,在许多情况下,它们的失败可归因于掩盖培训算法和基本数据分布之间的关键相互作用。为解决这一缺陷,我们提出了一个称为兼容性的概念,它以数据相关和算法相关的方式定量地描述一般化的特点。通过考虑整个培训轨迹并侧重于早期停止重复,兼容性充分利用了算法信息,从而产生更好的概括化保证。我们通过在梯度下降的过度参数化线性回归设置下从理论上研究兼容性来验证这一点。具体地说,我们进行数据依赖的轨迹分析,并为在这种环境下的兼容性创造充分的条件。我们的理论结果表明,从兼容性角度讲,普遍性与问题实例比上一个它性分析的局限性要小得多。