It is well known that the classical single linkage algorithm usually fails to identify clusters in the presence of outliers. In this paper, we propose a new version of this algorithm, and we study its mathematical performances. In particular, we establish an oracle type inequality which ensures that our procedure allows to recover the clusters with large probability under minimal assumptions on the distribution of the outliers. We deduce from this inequality the consistency and some rates of convergence of our algorithm for various situations. Performances of our approach is also assessed through simulation studies and a comparison with classical clustering algorithms on simulated data is also presented.
翻译:众所周知,古典单一联系算法通常无法在有外部线的情况下识别集群。 在本文中,我们提出了新版本的这种算法,并研究了它的数学性能。特别是,我们建立了一种甲骨文类不平等,确保我们的程序允许在对外部线分布的最小假设下以极有可能的方式回收集群。我们从这种不平等中推断出我们算法在各种情况下的一致性和某种程度的趋同性。我们还通过模拟研究和与模拟数据的典型组合算法的比较来评估我们的方法的绩效。