The performance of spectral clustering heavily relies on the quality of affinity matrix. A variety of affinity-matrix-construction (AMC) methods have been proposed but they have hyperparameters to determine beforehand, which requires strong experience and leads to difficulty in real applications, especially when the inter-cluster similarity is high and/or the dataset is large. In addition, we often need to choose different AMC methods for different datasets, which still depends on experience. To solve these two challenging problems, in this paper, we present a simple yet effective method for automated spectral clustering. First, we propose to find the most reliable affinity matrix via grid search or Bayesian optimization among a set of candidates given by different AMC methods with different hyperparameters, where the reliability is quantified by the \textit{relative-eigen-gap} of graph Laplacian introduced in this paper. Second, we propose a fast and accurate AMC method based on least squares representation and thresholding and prove its effectiveness theoretically. Finally, we provide a large-scale extension for the automated spectral clustering method, of which the time complexity is linear with the number of data points. Extensive experiments of natural image clustering show that our method is more versatile, accurate, and efficient than baseline methods.
翻译:光谱聚集的性能在很大程度上取决于亲近矩阵的质量。已经提出了各种亲和矩阵构建方法(AMC),但是它们有需要事先确定的超参数,这需要强大的经验和导致实际应用方面的困难,特别是当聚际间相似性高和/或数据集大的时候。此外,我们常常需要选择不同的AMC方法来建立不同的数据集,这仍然取决于经验。为了解决这两个具有挑战性的问题,我们在本文件中提出了一种简单而有效的自动光谱集成方法。首先,我们提议通过电网搜索或巴耶斯优化,在由不同超参数的不同AMC方法提供的一系列候选人中找到最可靠的亲近矩阵,这需要强大的经验和导致实际应用方面的困难,特别是当集群之间的相似性很高和/或数据集巨大的时候。此外,我们常常需要选择不同的AMC方法来建立不同的数据集。 其次,我们建议一种快速和准确的AMC方法,基于最小平方的表示和阈值,并证明其理论上的有效性。最后,我们提议通过电网搜索或巴耶斯最优化的方法为自动光谱质的光谱组合方法的大规模扩展,而其基础数据则更精准性地展示了。