Persistence diagrams (PD)s play a central role in topological data analysis. This analysis requires computing distances among such diagrams such as the 1-Wasserstein distance. Accurate computation of these PD distances for large data sets that render large diagrams may not scale appropriately with the existing methods. The main source of difficulty ensues from the size of the bipartite graph on which a matching needs to be computed for determining these PD distances. We address this problem by making several algorithmic and computational observations in order to obtain an approximation. First, taking advantage of the proximity of PD points, we condense them thereby decreasing the number of nodes in the graph for computation. The increase in point multiplicities is addressed by reducing the matching problem to a min-cost flow problem on a transshipment network. Second, we use Well Separated Pair Decomposition to sparsify the graph to a size that is linear in the number of points. Both node and arc sparsifications contribute to the approximation factor where we leverage a lower bound given by the Relaxed Word Mover's distance. Third, we eliminate bottlenecks during the sparsification procedure by introducing parallelism. Fourth, we develop an open source software called PDoptFlow based on our algorithm, exploiting parallelism by GPU and multicore. We perform extensive experiments and show that the actual empirical error is very low. We also show that we can achieve high performance at low guaranteed relative errors, improving upon the state of the arts.
翻译: Persistant 图表( PD) 在地形数据分析中发挥着核心作用。 此项分析需要计算像 1- Wasserstein 距离这样的图表之间的距离。 精确计算使大图表规模不与现有方法相适应的大数据集的PD距离。 主要的难题来源是两边图的大小, 确定这些 PD 距离时需要计算匹配。 我们通过进行若干算法和计算观测来解决这个问题, 以便获得近似值。 首先, 利用PD点的近距离, 我们将其压缩, 从而减少计算图中节点的数量。 点多点的距离可以通过将匹配问题降低到一个小成本流的问题与现有方法相适应来加以解决。 其次, 我们用Welld Pair Decompet 来将图的大小调整成直线性。 我们的节点和反偏斜度都有助于接近性系数, 我们利用了较低的Word Moveer 距离提供的较低界限。 第三, 我们通过平行的实验程序来消除了我们所处的卡化的卡度, 我们用一个平行的卡路路的多的卡。 我们用我们用来展示了我们所持的快速的卡。