The performance of spectral clustering heavily relies on the quality of affinity matrix. A variety of affinity-matrix-construction (AMC) methods have been proposed but they have hyperparameters to determine beforehand, which requires strong experience and lead to difficulty in real applications especially when the inter-cluster similarity is high or/and the dataset is large. In addition, we often need to choose different AMC methods for different datasets, which still depends on experience. To solve these two challenging problems, in this paper, we present a simple yet effective method for automated spectral clustering. The main idea is to find the most reliable affinity matrix among a set of candidates given by different AMC methods with different hyperparameters, where the reliability is quantified by the \textit{relative-eigen-gap} of graph Laplacian introduced in this paper. We also implement the method using Bayesian optimization.We extend the method to large-scale datasets such as MNIST, on which the time cost is less than 90s and the clustering accuracy is state-of-the-art. Extensive experiments of natural image clustering show that our method is more versatile, accurate, and efficient than baseline methods.
翻译:光谱聚集的性能在很大程度上取决于近距离矩阵的质量。已经提出了各种亲和矩阵构建方法(AMC),但是它们有需要事先确定的超参数,这需要强有力的经验,并导致实际应用方面的困难,特别是当聚际间相似性高或/和数据集大的情况下。此外,我们常常需要选择不同的AMC方法来管理不同的数据集,这仍然取决于经验。为了解决这两个具有挑战性的问题,我们在本文件中提出了一种简单而有效的自动光谱集集方法。主要的想法是找到由不同超光度计的不同AMC方法提供的一组候选人中最可靠的亲和矩阵。 不同超光度计的不同AMC方法给出了一套候选人,这要求有很强的经验,导致实际应用困难,特别是当本文件中采用的Laplacecian图中的\textit{retative-egen-gap}量化可靠性时。我们还使用Bayesian 优化方法。我们把这一方法推广到大型数据集,例如MNIST,其时间不到90,而集组的精确度是状态的。 远比自然图像的基本实验方法更精确。