Within the context of topological data analysis, the problems of identifying topological significance and matching signals across datasets are important and useful inferential tasks in many applications. The limitation of existing solutions to these problems, however, is computational speed. In this paper, we harness the state-of-the-art for persistent homology computation by studying the problem of determining topological prevalence and cycle matching using a cohomological approach, which increases their feasibility and applicability to a wider variety of applications and contexts. We demonstrate this on a wide range of real-life, large-scale, and complex datasets. We extend existing notions of topological prevalence and cycle matching to include general non-Morse filtrations. This provides the most general and flexible state-of-the-art adaptation of topological signal identification and persistent cycle matching, which performs comparisons of orders of ten for thousands of sampled points in a matter of minutes on standard institutional HPC CPU facilities.
翻译:在统计学数据分析的范围内,查明表层意义和对齐跨数据集信号的问题在许多应用中是重要的和有益的推论任务。但是,这些问题的现有解决办法的局限性在于计算速度。在本文中,我们通过研究使用同源法方法确定表层分布和周期匹配的问题,从而利用目前最先进的持久性同质计算方法,研究如何确定表层分布和周期匹配问题,这增加了其可行性和对更广泛的各种应用和背景的适用性。我们在各种实际生活、大规模和复杂的数据集中证明了这一点。我们扩展了现有的表层分布和周期匹配概念,包括一般的非摩尔式过滤。这提供了最一般和最灵活的最先进的地调整表层信号识别和持续周期匹配方法,后者在标准的HPC CPU 设施上对千个抽样点的顺序进行了几分钟的比较。