One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent (Nagarajan and Kolter, 2019). In many scenarios, their failure can be attributed to obscuring the crucial interplay between the training algorithm and the underlying data distribution. To address this issue, we propose a concept named compatibility, which quantitatively characterizes generalization in a both data-relevant and algorithm-relevant manner. By considering the entire training trajectory and focusing on early-stopping iterates, compatibility exploits the data and the algorithm information and is therefore a more suitable notion for generalization. We validate this by theoretically studying compatibility under the setting of solving overparameterized linear regression with gradient descent. Specifically, we perform a data-dependent trajectory analysis and derive a sufficient condition for compatibility in such a setting. Our theoretical results demonstrate that in the sense of compatibility, generalization holds with significantly weaker restrictions on the problem instance than the previous last iterate analysis.
翻译:机械学习中一个主要的公开问题就是将过度参数化制度中的笼统化定性为典型化,因为大多数传统的概括化界限都变得不一致(Nagarajan和Kolter, 2019年),在许多情况下,它们的失败可归因于掩盖了培训算法和基本数据分布之间的关键相互作用。为了解决这一问题,我们提出了一个称为兼容性的概念,在数量上以数据相关和算法相关的方式对概括化进行定性。通过考虑整个培训轨迹并侧重于早期停止重复化,兼容性利用了数据和算法信息,因此是比较适合概括化的概念。我们通过在解决梯度下降的过度参数化线性回归时从理论上研究兼容性来验证这一点。具体地说,我们进行数据依赖的轨迹分析,并为在这种环境下的兼容性提出充分的条件。我们的理论结果表明,在兼容性意义上,对问题实例的普遍化的限制比上一个梯度分析要弱得多。