Persistent homology is a leading tool in topological data analysis (TDA). Many problems in TDA can be solved via homological -- and indeed, linear -- algebra. However, matrices in this domain are typically large, with rows and columns numbered in billions. Low-rank approximation of such arrays typically destroys essential information; thus, new mathematical and computational paradigms are needed for very large, sparse matrices. We present the U-match matrix factorization scheme to address this challenge. U-match has two desirable features. First, it admits a compressed storage format that reduces the number of nonzero entries held in computer memory by one or more orders of magnitude over other common factorizations. Second, it permits direct solution of diverse problems in linear and homological algebra, without decompressing matrices stored in memory. These problems include look-up and retrieval of rows and columns; evaluation of birth/death times, and extraction of generators in persistent (co)homology; and, calculation of bases for boundary and cycle subspaces of filtered chain complexes. Such bases are key to unlocking a range of other topological techniques for use in TDA, and U-match factorization is designed to make such calculations broadly accessible to practitioners. As an application, we show that individual cycle representatives in persistent homology can be retrieved at time and memory costs orders of magnitude below current state of the art, via global duality. Moreover, the algebraic machinery needed to achieve this computation already exists in many modern solvers.
翻译:单质数据分析(TDA)中有许多问题可以通过同质 -- -- 事实上,线性 -- -- 代数法来解决。然而,这个领域的矩阵通常很大,行数和列数以数十亿计。这类阵列的低排序近似值通常会破坏基本信息;因此,对于非常大、稀疏的矩阵,需要新的数学和计算模式;我们提出Umatch矩阵集成办法来应对这一挑战。Umatch有两个可取的特征。首先,它承认压缩储存格式,减少了计算机存储中一个或一个以上数量级的计算机存储中非零条目的数量。第二,它允许直接解决线性和同质代数中的各种问题,而没有将存储在记忆中的减压矩阵。因此,这些问题包括查看和检索行和柱;评估出生/死亡时间,以及从持久性(comhology)中提取发电机。计算过滤链综合体的边界和循环子空间的基数。这些基数对于解计算机存储一个或一个以上数量级级的计算机存储器存储一个以上的非零条目的数量,对于将一个或多个数量级的计算机存储器存储器存储器存储到其他直径直径直径直径直径的计算成本的计算方法,在TTDA的计算中,从而从一个循环计算中可以将一系列的计算成本计算成本计算出一个循环计算成本到一个循环的计算成本的计算成本的计算,在TDA中,在TLA中可以显示。