EGGS: Eigen-Gap 向导搜索使子空间集群容易 (EGGS: Eigen-Gap Guided Search Making Subspace Clustering Easy)

The performance of spectral clustering heavily relies on the quality of affinity matrix. A variety of affinity-matrix-construction methods have been proposed but they have hyper-parameters to determine beforehand, which requires strong experience and lead to difficulty in real applications especially when the inter-cluster similarity is high or/and the dataset is large. On the other hand, we often have to determine to use a linear model or a nonlinear model, which still depends on experience. To solve these two problems, in this paper, we present an eigen-gap guided search method for subspace clustering. The main idea is to find the most reliable affinity matrix among a set of candidates constructed by linear and kernel regressions, where the reliability is quantified by the \textit{relative-eigen-gap} of graph Laplacian defined in this paper. We show, theoretically and numerically, that the Laplacian matrix with a larger relative-eigen-gap often yields a higher clustering accuracy and stability. Our method is able to automatically search the best model and hyper-parameters in a pre-defined space. The search space is very easy to determine and can be arbitrarily large, though a relatively compact search space can reduce the highly unnecessary computation. Our method has high flexibility and convenience in real applications, and also has low computational cost because the affinity matrix is not computed by iterative optimization. We extend the method to large-scale datasets such as MNIST, on which the time cost is less than 90s and the clustering accuracy is state-of-the-art. Extensive experiments of natural image clustering show that our method is more stable, accurate, and efficient than baseline methods.

翻译：光谱群集的性能在很大程度上取决于亲近矩阵的质量。已经提出了各种亲近性矩阵构建方法,但主要想法是在一组候选人中找到最可靠的精密度矩阵,以事先确定,这需要强有力的经验,并导致实际应用方面的困难,特别是在集群间相似性高或/和数据集大的情况下。另一方面,我们往往必须确定使用线性模型或非线性模型,这仍然取决于经验。为了解决这两个问题,我们在本文件中为子空间群集提出了一种精密性粗通的引导搜索方法。主要想法是在一组候选人中找到最可靠的精密度缩略图,而这一组候选人是用线性和内核回归回归法构建的,而当本文件中定义的Laplacecian图的可靠性被量化时,从理论上和数字上看,我们用一个较大的相对精密的内基质模型往往能提高内基质的准确度和稳定性。我们的方法可以自动搜索最优的模型和超值的直流利度模型和超值的直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直径直的模型, 。搜索是高直径直径直径直的太空的计算方法, 。。。。搜索可以使空间的內基底基底基底的计算法是高空基底的计算法是高空基底法, 。