Recent years have witnessed a tremendous growth using topological summaries, especially the persistence diagrams (encoding the so-called persistent homology) for analyzing complex shapes. Intuitively, persistent homology maps a potentially complex input object (be it a graph, an image, or a point set and so on) to a unified type of feature summary, called the persistence diagrams. One can then carry out downstream data analysis tasks using such persistence diagram representations. A key problem is to compute the distance between two persistence diagrams efficiently. In particular, a persistence diagram is essentially a multiset of points in the plane, and one popular distance is the so-called 1-Wasserstein distance between persistence diagrams. In this paper, we present two algorithms to approximate the 1-Wasserstein distance for persistence diagrams in near-linear time. These algorithms primarily follow the same ideas as two existing algorithms to approximate optimal transport between two finite point-sets in Euclidean spaces via randomly shifted quadtrees. We show how these algorithms can be effectively adapted for the case of persistence diagrams. Our algorithms are much more efficient than previous exact and approximate algorithms, both in theory and in practice, and we demonstrate its efficiency via extensive experiments. They are conceptually simple and easy to implement, and the code is publicly available in github.
翻译:近些年来, 使用地形摘要, 特别是用于分析复杂形状的持久性图表( 将所谓的持久性同系物编码为所谓的持续性同系物) 出现了巨大的增长。 直观的持久性同系物将潜在复杂的输入对象( 无论是图形、 图像或点设置等等) 映射为统一的特征摘要类型, 称为持久性图表。 然后, 可以使用这种持久性图示执行下游数据分析任务 。 一个关键问题是如何有效地计算两个持久性图表之间的距离 。 特别是, 持久性图表基本上是飞机上的一组点, 一个受欢迎的距离是所谓的瓦瑟斯坦持续性图表之间的1 - 瓦瑟斯坦距离 。 在本文中, 我们用两种算法将大约1 - 瓦瑟斯坦距离绘制为近线时间的持久性图表。 这些算法主要遵循与两种现有的算法相同的想法, 以近似于以随机变换的夸脱的二次图示空间两个定点之间的最佳迁移方式。 我们用这些算法来有效地调整这些持续性图表的情况, 一个流行的距离是所谓的1 - 瓦瑟斯坦斯坦 距离。 我们的算算法比先前的精确和精确的算法中, 更简单。 我们的理论中, 和广泛的实验中, 我们的理论是用来展示, 我们的理论是简单的, 和精确的,, 和精确的实验法是简单的方法, 。