Topological Data Analysis is a growing area of data science, which aims at computing and characterizing the geometry and topology of data sets, in order to produce useful descriptors for subsequent statistical and machine learning tasks. Its main computational tool is persistent homology, which amounts to track the topological changes in growing families of subsets of the data set itself, called filtrations, and encode them in an algebraic object, called persistence module. Even though algorithms and theoretical properties of modules are now well-known in the single-parameter case, that is, when there is only one filtration to study, much less is known in the multi-parameter case, where several filtrations are given at once. Though more complicated, the resulting persistence modules are usually richer and encode more information, making them better descriptors for data science. In this article, we present the first approximation scheme, which is based on fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence, for computing and decomposing general multi-parameter persistence modules. Our algorithm has controlled complexity and running time, and works in arbitrary dimension, i.e., with an arbitrary number of filtrations. Moreover, when restricting to specific classes of multi-parameter persistence modules, namely the ones that can be decomposed into intervals, we establish theoretical results about the approximation error between our estimate and the true module in terms of interleaving distance. Finally, we present empirical evidence validating output quality and speed-up on several data sets.
翻译:地形数据分析是一个不断增长的数据科学领域,其目的在于计算和描述数据集的几何和地形学,以便为随后的统计和机器学习任务提供有用的描述符。其主要计算工具是持久性同质学,这相当于跟踪数据集本身子子子子子子子子子子子子子子子子子子子子子子子子子子子子子子子子子子的地形变化,称为过滤器,并将其编码成代数,称为耐久性模块。尽管在单数参数中,模块的算法和模块的理论属性现已广为人知,也就是说,当只有一种需要研究的过滤器时,在多参数的多参数中,就更不为人所知。尽管更为复杂,但由此产生的持久性模块通常会更丰富和编码更多的信息,从而使它们更好地用于数据科学。在本文章中,我们展示的第一个近似方法,它基于纤维条码和精确匹配,两部模型的构造可以源于一个常数参数的精确度理论,用于计算和分解的距离证据,也就是我们数级的直径直径的模型的模型中,我们用一个任意的精度模型来控制了一个精度的精度,并测量的精度。