We consider the problem of kernel classification. Works on kernel regression have shown that the rate of decay of the prediction error with the number of samples for a large class of data-sets is well characterized by two quantities: the capacity and source of the data-set. In this work, we compute the decay rates for the misclassification (prediction) error under the Gaussian design, for data-sets satisfying source and capacity assumptions. We derive the rates as a function of the source and capacity coefficients for two standard kernel classification settings, namely margin-maximizing Support Vector Machines (SVM) and ridge classification, and contrast the two methods. As a consequence, we find that the known worst-case rates are loose for this class of data-sets. Finally, we show that the rates presented in this work are also observed on real data-sets.
翻译:我们考虑内核分类问题。内核回归工程表明,预测误差的衰减率与大量数据集样本数量的衰减率有两大特征:数据集的能力和来源。在这项工作中,我们计算高斯设计下的误分类(预测)误差的衰减率,用于符合源和能力假设的数据集。我们得出比率,作为两个标准内核分类设置(即边缘-最大化辅助矢量机(SVM)和脊柱分类)的源系数和容量系数的函数,并比较两种方法。结果,我们发现,已知的最坏的这类数据集的衰减率是松散的。最后,我们显示,在真实的数据集中也观察到了这项工作中显示的比率。