Persistent homology is a fundamental methodology from topological data analysis that summarizes the lifetimes of topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, a significant challenge to its widespread implementation, especially in statistical methodology and machine learning algorithms, is the format of the persistence diagram as a multiset of half-open intervals. In this paper, we comprehensively study $k$-means clustering where the input is various embeddings of persistence diagrams, as well as persistence diagrams themselves and their generalizations as persistence measures. We show that the clustering performance directly on persistence diagrams and measures far outperform their vectorized representations, despite their more complex representations. Moreover, we prove convergence of the algorithm on persistence diagram space and establish theoretical properties of the solution to the optimization problem in the Karush--Kuhn--Tucker framework.
翻译:从表层数据分析中得出的一项基本方法是持久性同质学,该分析将一组数据集中的地形特征的寿命总结为持久性图表;最近,该数据集因其在许多领域的各种成功应用而获得很大支持;然而,其广泛应用,特别是在统计方法和机器学习算法方面,对其广泛应用的一个重大挑战是持久性图表的格式,即作为多种半开放间隔的多组。在本文件中,我们全面研究以美元为单位的组合方法,其中输入的内容是各种持久性图表的嵌入,以及以持久性图本身及其概括性作为持久性衡量标准。我们表明,在持久性图表和计量法上的直接组合性表现远远超出其矢量表现,尽管其表述更为复杂。此外,我们证明在持久性图表空间上的算法是趋同的,并确定了Karush-Kuhn-Tucker框架中最佳问题解决方案的理论特性。