A major challenge in modern machine learning is theoretically understanding the generalization properties of overparameterized models. Many existing tools rely on uniform convergence (UC), a property that, when it holds, guarantees that the test loss will be close to the training loss, uniformly over a class of candidate models. Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails. Our main contribution is proving novel generalization bounds in two such settings, one linear, and one non-linear. We study the linear classification setting of Nagarajan and Kolter, and a quadratic ground truth function learned via a two-layer neural network in the non-linear regime. We prove a new type of margin bound showing that above a certain signal-to-noise threshold, any near-max-margin classifier will achieve almost no test loss in these two settings. Our results show that near-max-margin is important: while any model that achieves at least a $(1 - \epsilon)$-fraction of the max-margin generalizes well, a classifier achieving half of the max-margin may fail terribly. Building on the impossibility results of Nagarajan and Kolter, under slightly stronger assumptions, we show that one-sided UC bounds and classical margin bounds will fail on near-max-margin classifiers. Our analysis provides insight on why memorization can coexist with generalization: we show that in this challenging regime where generalization occurs but UC fails, near-max-margin classifiers simultaneously contain some generalizable components and some overfitting components that memorize the data. The presence of the overfitting components is enough to preclude UC, but the near-extremal margin guarantees that sufficient generalizable components are present.
翻译:现代机器学习的一大挑战是从理论上理解超分度模型的普遍化特性。 许多现有工具依靠统一的趋同(UC) 。 许多现有工具依赖统一的趋同(UC) 。 当它存在时, 它保证测试损失将接近培训损失, 在一组候选模型中统一。 Nagarajan 和 Kolter (2019年) 表明在某些简单的线性和神经网络设置中, 任何统一的趋同都会是空的, 留下如何在UC失败的环境下证明普遍化的问题。 我们的主要贡献正在证明在两种这样的设置中, 一个直线性和非线性地基的正轨化界限。 我们研究Nagarajan 和 Kolter 的线性分类设置, 以及一个通过非线性模型双层神经神经网络学习的二次地貌真相函数。 我们证明, 在某种信号到音异的临界值临界值阈值下, 任何接近负负差的分级的分解都会在这两种环境中几乎没有损失。 我们的结果显示, 接近正负值的分解是相当重要的: 任何模型, 最接近于最接近的直径的直径的直径的直径的直径的内, 。</s>