Topological data analysis (TDA) delivers invaluable and complementary information on the intrinsic properties of data inaccessible to conventional methods. However, high computational costs remain the primary roadblock hindering the successful application of TDA in real-world studies, particularly with machine learning on large complex networks. Indeed, most modern networks such as citation, blockchain, and online social networks often have hundreds of thousands of vertices, making the application of existing TDA methods infeasible. We develop two new, remarkably simple but effective algorithms to compute the exact persistence diagrams of large graphs to address this major TDA limitation. First, we prove that $(k+1)$-core of a graph $\mathcal{G}$ suffices to compute its $k^{th}$ persistence diagram, $PD_k(\mathcal{G})$. Second, we introduce a pruning algorithm for graphs to compute their persistence diagrams by removing the dominated vertices. Our experiments on large networks show that our novel approach can achieve computational gains up to 95%. The developed framework provides the first bridge between the graph theory and TDA, with applications in machine learning of large complex networks. Our implementation is available at https://github.com/cakcora/PersistentHomologyWithCoralPrunit
翻译:地形数据分析(TDA)提供了常规方法无法获取的数据内在特性的宝贵和补充信息。然而,高计算成本仍然是阻碍在现实世界研究中成功应用TDA的主要障碍,特别是在大型复杂网络上的机器学习。事实上,多数现代网络,如引用、块链和在线社交网络,往往有数十万个顶点,使得现有的TDA方法的应用不可行。我们开发了两种新的、非常简单但有效的算法,用以计算大图的精确持久性图,以解决TDA这一主要限制。首先,我们证明一个图形$\mathcal{G}$的核心$(k+1)足以计算出其$k_th}的耐久性图, $PD_k(\mathcal{G})。第二,我们引入了一个图表的运行算法,以通过消除占主导地位的顶点来计算其耐久性图。我们在大型网络上进行的实验表明,我们的新办法可以达到95%的计算收益。我们开发的框架提供了在我们的图表/TDCRODA应用中的第一座桥梁。