Support vector clustering is an important clustering method. However, it suffers from a scalability issue due to its computational expensive cluster assignment step. In this paper we accelertate the support vector clustering via spectrum-preserving data compression. Specifically, we first compress the original data set into a small amount of spectrally representative aggregated data points. Then, we perform standard support vector clustering on the compressed data set. Finally, we map the clustering results of the compressed data set back to discover the clusters in the original data set. Our extensive experimental results on real-world data set demonstrate dramatically speedups over standard support vector clustering without sacrificing clustering quality.
翻译:支持向量聚类是一种重要的聚类方法。然而,由于其计算昂贵的群集分配步骤,它存在可扩展性问题。在本文中,我们通过保留谱数据压缩来加速支持向量聚类。具体而言,我们首先将原始数据集压缩为少量的谱代表性聚合数据点。然后,我们对压缩后的数据集执行标准的支持向量聚类。最后,我们将压缩数据集的聚类结果映射回来,以发现原始数据集中的聚类。我们在真实数据集上进行了广泛的实验结果,证明相对于标准的支持向量聚类,我们的方法大大加速了聚类速度而不牺牲聚类质量。