Support vector data description (SVDD) is a machine learning technique that is used for single-class classification and outlier detection. The idea of SVDD is to find a set of support vectors that defines a boundary around data. When dealing with online or large data, existing batch SVDD methods have to be rerun in each iteration. We propose an incremental learning algorithm for SVDD that uses the Gaussian kernel. This algorithm builds on the observation that all support vectors on the boundary have the same distance to the center of sphere in a higher-dimensional feature space as mapped by the Gaussian kernel function. Each iteration only involves the existing support vectors and the new data point. The algorithm is based solely on matrix manipulations; the support vectors and their corresponding Lagrange multiplier $\alpha_i$'s are automatically selected and determined in each iteration. It can be seen that the complexity of our algorithm in each iteration is only $O(k^2)$, where $k$ is the number of support vectors. Our experimental results on some real data sets show that our incremental algorithm achieves similar F-1 scores with much less running time.
翻译:支持矢量描述( SVDDD) 是一种用于单级分类和外星探测的机器学习技术。 SVDD 的构想是寻找一组支持矢量,以界定数据周围的界限。在处理在线或大数据时,现有的SVDD 批量方法必须在每个迭代中重新运行。我们为SVDD提出一个使用高斯内核的递增学习算法。这个算法基于这样一种观察,即边界上的所有支持矢量与高斯内核函数绘制的更高维特征空间的球中心距离相同。每种迭代仅涉及现有的支持矢量和新数据点。算法仅以矩阵操作为基础;支持矢量及其对应的Lagrange乘数 $\alpha_i$'$是在每个迭代中自动选择和确定的。可以看到,我们每次迭代的算法的复杂性只有$O(k2)$, $k$是支持矢量的数量。在一些实际数据组上,我们的实验结果显示,我们的递增算法的F分数要少得多。