The theoretical analysis of spectral clustering mainly focuses on consistency, while there is relatively little research on its generalization performance. In this paper, we study the excess risk bounds of the popular spectral clustering algorithms: \emph{relaxed} RatioCut and \emph{relaxed} NCut. Firstly, we show that their excess risk bounds between the empirical continuous optimal solution and the population-level continuous optimal solution have a $\mathcal{O}(1/\sqrt{n})$ convergence rate, where $n$ is the sample size. Secondly, we show the fundamental quantity in influencing the excess risk between the empirical discrete optimal solution and the population-level discrete optimal solution. At the empirical level, algorithms can be designed to reduce this quantity. Based on our theoretical analysis, we propose two novel algorithms that can not only penalize this quantity, but also cluster the out-of-sample data without re-eigendecomposition on the overall sample. Experiments verify the effectiveness of the proposed algorithms.
翻译:光谱群集的理论分析主要侧重于一致性,而对于其一般化性能的研究相对较少。 在本文中,我们研究了广受欢迎的光谱群集算法的超重风险界限: /emph{relax} RielCut 和\emph{relax} NCut。 首先,我们表明,它们的经验性连续最佳解决方案与人口水平连续最佳解决方案之间的超重风险界限是$\mathcal{O}(1/\ sqrt{n})$的趋同率,而美元是样本的大小。 其次,我们展示了经验性离散最佳解决方案与人口水平离散最佳解决方案之间影响超重风险的基本数量。 在经验层面上,算法可以用来减少这一数量。 根据我们的理论分析,我们提出了两种新颖的算法,不仅可以惩罚这一数量,而且可以在不对整个样本进行再基因组分解的情况下将外的数据集中在一起。 实验可以验证拟议的算法的有效性。