Euler 特征曲线和剖析档:大数据问题的稳定变形 (Euler Characteristic Curves and Profiles: a stable shape invariant for big data problems)

Tools of Topological Data Analysis provide stable summaries encapsulating the shape of the considered data. Persistent homology, the most standard and well studied data summary, suffers a number of limitations; its computations are hard to distribute, it is hard to generalize to multifiltrations and is computationally prohibitive for big data-sets. In this paper we study the concept of Euler Characteristics Curves, for one parameter filtrations and Euler Characteristic Profiles, for multi-parameter filtrations. While being a weaker invariant in one dimension, we show that Euler Characteristic based approaches do not possess some handicaps of persistent homology; we show efficient algorithms to compute them in a distributed way, their generalization to multifiltrations and practical applicability for big data problems. In addition we show that the Euler Curves and Profiles enjoys certain type of stability which makes them robust tool in data analysis. Lastly, to show their practical applicability, multiple use-cases are considered.

翻译：地形数据分析工具提供了稳定的摘要,概括了被考虑的数据形状。持久性同系物(最标准、研究最周密的数据摘要)受到若干限制; 其计算很难分配, 难以概括为多过滤, 且计算上无法容纳大数据集。在本文中, 我们研究一个参数过滤和多参数过滤的超特征特征剖面概念。我们虽然在一个维度上是一个较弱的变异性, 但我们显示, 以超特征为基础的方法并不具有某些持久性同系物的缺陷; 我们展示有效的算法, 以分布方式进行计算, 将其概括为多过滤, 并实际适用于大数据问题。此外, 我们显示, Euler 曲线和剖面具有某种稳定性, 从而在数据分析中使其具有强大的工具。最后, 为了显示其实际适用性, 我们考虑了多种使用案例。