Support vector clustering is an important clustering method. However, it suffers from a scalability issue due to its computational expensive cluster assignment step. In this paper we accelertate the support vector clustering via spectrum-preserving data compression. Specifically, we first compress the original data set into a small amount of spectrally representative aggregated data points. Then, we perform standard support vector clustering on the compressed data set. Finally, we map the clustering results of the compressed data set back to discover the clusters in the original data set. Our extensive experimental results on real-world data set demonstrate dramatically speedups over standard support vector clustering without sacrificing clustering quality.
翻译:支持向量聚类是一种重要的聚类方法,但由于其计算较为昂贵的聚簇分配步骤,它面临着可扩展性问题。本文通过基于保谱压缩的方法加速支持向量聚类。具体而言,我们将原始数据集压缩成少量的具有代表性的保谱聚合数据点。随后,在压缩数据集上执行标准的支持向量聚类。最后,将压缩数据集的聚类结果映射回原始数据集,以发现原始数据集中的聚类。我们广泛的实验结果表明,在不损失聚类质量的情况下,与标准支持向量聚类相比,我们的方法获得了显著的加速。