Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we prove convergence of the $k$-means clustering algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework. Additionally, we perform numerical experiments on various representations of persistent homology, including embeddings of persistence diagrams as well as diagrams themselves and their generalizations as persistence measures; we find that clustering performance directly on persistence diagrams and measures outperform their vectorized representations.
翻译:持久性同系物是一种对地貌学数据分析的核心方法,它提取并总结了数据集中的地形特征,作为持久性图表;它最近因其在很多领域的成功应用而变得非常受欢迎;然而,它的代数构造引出了具有高度复杂几何特征的耐久性图的测量空间。在本文中,我们证明美元平均值组合算法在持久性图空间上趋于一致,并确定了Karush-Kuhn-Tucker框架中优化问题解决方案的理论属性。此外,我们还对持久性同系物的各种表现进行了数字实验,包括嵌入持久性图和图本身,以及作为持久性衡量尺度的概括性图;我们发现,将性能直接集中在持久性图和测量结果上,这与其矢量表示相悖。