We study the generalization of over-parameterized classifiers where Empirical Risk Minimization (ERM) for learning leads to zero training error. In these over-parameterized settings there are many global minima with zero training error, some of which generalize better than others. We show that under certain conditions the fraction of "bad" global minima with a true error larger than {\epsilon} decays to zero exponentially fast with the number of training data n. The bound depends on the distribution of the true error over the set of classifier functions used for the given classification problem, and does not necessarily depend on the size or complexity (e.g. the number of parameters) of the classifier function set. This might explain the unexpectedly good generalization even of highly over-parameterized Neural Networks. We support our mathematical framework with experiments on a synthetic data set and a subset of MNIST.
翻译:我们研究了超参数分类器的通用性,因为实验风险最小化(ERM)用于学习的超参数分类器导致零培训错误。在这些超参数化的设置中,有许多全球迷你器,没有培训错误,有些比其他的要好。我们表明,在某些条件下,“坏”全球迷你器的碎片,其真正的错误大于 ~ ipsilon},会随着培训数据的数量而以指数化速度衰减到零。 约束值取决于在特定分类问题中使用的分类函数的正确误差分布,而不一定取决于分类函数的大小或复杂程度(例如参数数),这可能解释出乎意料地高参数化神经网络的精细化。我们支持我们的数学框架,在合成数据集和MNIST子上进行实验。