We present a multiscale, consistent approach to density-based clustering that satisfies stability theorems -- in both the input data and in the parameters -- which hold without distributional assumptions. The stability in the input data is with respect to the Gromov--Hausdorff--Prokhorov distance on metric probability spaces and interleaving distances between (multi-parameter) hierarchical clusterings we introduce. We prove stability results for standard simplification procedures for hierarchical clusterings, which can be combined with our approach to yield a stable flat clustering algorithm. We illustrate the stability of the approach with computational examples. Our framework is based on the concepts of persistence and interleaving distance from Topological Data Analysis.
翻译:我们对基于密度的集群提出一个符合稳定性理论的多尺度、一致的方法,在输入数据和参数中都符合稳定性理论,没有分布假设。输入数据的稳定性涉及我们引入的度概率空间和(多参数)分层集群之间的间距的格罗莫夫-豪斯多夫-普罗霍罗夫距离。我们证明,分层集群的标准简化程序具有稳定性,这可以与我们制定稳定的平坦组合算法的方法结合起来。我们用计算实例来说明该方法的稳定性。我们的框架基于持久性和与地形数据分析的相隔距离的概念。