In this paper, we study the hard and soft support vector regression techniques applied to a set of $n$ linear measurements of the form $y_i=\boldsymbol{\beta}_\star^{T}{\bf x}_i +n_i$ where $\boldsymbol{\beta}_\star$ is an unknown vector, $\left\{{\bf x}_i\right\}_{i=1}^n$ are the feature vectors and $\left\{{n}_i\right\}_{i=1}^n$ model the noise. Particularly, under some plausible assumptions on the statistical distribution of the data, we characterize the feasibility condition for the hard support vector regression in the regime of high dimensions and, when feasible, derive an asymptotic approximation for its risk. Similarly, we study the test risk for the soft support vector regression as a function of its parameters. Our results are then used to optimally tune the parameters intervening in the design of hard and soft support vector regression algorithms. Based on our analysis, we illustrate that adding more samples may be harmful to the test performance of support vector regression, while it is always beneficial when the parameters are optimally selected. Such a result reminds a similar phenomenon observed in modern learning architectures according to which optimally tuned architectures present a decreasing test performance curve with respect to the number of samples.
翻译:在本文中,我们研究对美元形式的线性测量应用的硬和软支持矢量回归技术。 特别是,根据对数据统计分布的一些合理假设,我们确定在高维系统中硬支持矢量回归的可行性条件,并在可行的情况下,为风险得出一种无症状的近似值。同样,我们研究软支持矢量回归的测试风险,这是其参数的函数。然后,我们的结果被用来优化调和用于设计硬和软支持矢量回归算法的参数。根据我们的分析,我们说明增加更多的样本可能对支持矢量回归的测试性能有害,同时在选择了最佳性能的模型时,也总是能够提醒当前模型的测试,以最优性能为标志。