Topological Data Analysis (TDA), a relatively new field of data analysis, has proved very useful in a variety of applications. The main persistence tool from TDA is persistent homology in which data structure is examined at many scales. Representations of persistent homology include persistence barcodes and persistence diagrams, both of which are not straightforward to reconcile with traditional machine learning algorithms as they are sets of intervals or multisets. The problem of faithfully representing barcodes and persistent diagrams has been pursued along two main avenues: kernel methods and vectorizations. One vectorization is the Betti sequence, or Betti curve, derived from the persistence barcode. While the Betti sequence has been used in classification problems in various applications, to our knowledge, the stability of the sequence has never before been discussed. In this paper we show that the Betti sequence is unstable under the 1-Wasserstein metric with regards to small perturbations in the barcode from which it is calculated. In addition, we propose a novel stabilized version of the Betti sequence based on the Gaussian smoothing seen in the Stable Persistence Bag of Words for persistent homology. We then introduce the normalized cumulative Betti sequence and provide numerical examples that support the main statement of the paper.
翻译:地形数据分析(TDA)是一个相对新的数据分析领域,在各种应用中被证明是非常有用的。来自TDA的主要持久性工具是持续同族体,在其中对数据结构进行多种比例的检查。持久性同族体的表示包括持久性条码和持久性图表,两者并非直截了当,无法与传统的机器学习算法相协调,因为它们是间隔或多套的组合。忠实代表条形码和持久性图表的问题在两个主要途径(内核方法和矢量化)中得到了解决。一种矢量化是源自持久性条形码的贝蒂序列或贝蒂曲线。虽然Betti序列在各种应用的分类问题中使用过,但据我们所知,该序列的稳定性从未讨论过。在本文中,我们表明,在1-Wasserstein标准下,贝蒂序列不稳定,与它所计算的条形码中的小扰动有关。此外,我们建议一种新型的贝蒂序列稳定化版本,其基础是持续性条形平滑。我们建议了当时在Stable Persience Basing Brains Basinal 中看到的主要单质模型,为持续性的常态。我们提供了持续性纸质的常态。