We derive improved regression and classification rates for support vector machines using Gaussian kernels under the assumption that the data has some low-dimensional intrinsic structure that is described by the box-counting dimension. Under some standard regularity assumptions for regression and classification we prove learning rates, in which the dimension of the ambient space is replaced by the box-counting dimension of the support of the data generating distribution. In the regression case our rates are in some cases minimax optimal up to logarithmic factors, whereas in the classification case our rates are minimax optimal up to logarithmic factors in a certain range of our assumptions and otherwise of the form of the best known rates. Furthermore, we show that a training validation approach for choosing the hyperparameters of an SVM in a data dependent way achieves the same rates adaptively, that is without any knowledge on the data generating distribution.
翻译:使用高斯内核对支持矢量机器进行改进的回归率和分类率,其假设是数据具有由箱式计数维度描述的低维内在结构。根据某些标准的回归和分类常规假设,我们证明学习率,其中环境空间的维度被支持数据生成分布的箱式计数维度所取代。在回归案例中,我们的比率在某些情况下小于对数系数,而在分类中,我们的比率最优于对数系数,在一定范围的假设中达到对数系数的最小最大值,而在其他情况下,则是已知最佳率的形式。此外,我们表明,以数据依赖的方式选择SVM超参数的培训验证方法以适应性的方式达到同一比率,在数据生成分布方面没有任何知识。