Commonly used classification algorithms in machine learning, such as support vector machines, minimize a convex surrogate loss on training examples. In practice, these algorithms are surprisingly robust to errors in the training data. In this work, we identify a set of conditions on the data under which such surrogate loss minimization algorithms provably learn the correct classifier. This allows us to establish, in a unified framework, the robustness of these algorithms under various models on data as well as error. In particular, we show that if the data is linearly classifiable with a slightly non-trivial margin (i.e. a margin at least $C/\sqrt{d}$ for $d$-dimensional unit vectors), and the class-conditional distributions are near isotropic and logconcave, then surrogate loss minimization has negligible error on the uncorrupted data even when a constant fraction of examples are adversarially mislabeled.
翻译:在机器学习中常用的分类算法,例如支持矢量机,最大限度地减少培训实例中的螺旋代谢损失。 实际上,这些算法对培训数据中的错误来说是惊人的。 在这项工作中,我们确定了一套数据条件,根据这些替代损失最小化算法,可以合理学习正确的分类器。这使我们能够在一个统一的框架内,在各种数据模型和错误的模型下确定这些算法的稳健性。特别是,我们表明,如果数据以略微非三角差值(即美元-维单位矢量的差值至少为$C/\sqrt{d}$美元)线性分类,而类条件分布接近异形和对数剖面,那么,即便在对称错误的常数例子中,在未整理的数据上可忽略的差错,则可以忽略不计数。