In Topological Data Analysis, a common way of quantifying the shape of data is to use a persistence diagram (PD). PDs are multisets of points in $\mathbb{R}^2$ computed using tools of algebraic topology. However, this multi-set structure limits the utility of PDs in applications. Therefore, in recent years efforts have been directed towards extracting informative and efficient summaries from PDs to broaden the scope of their use for machine learning tasks. We propose a computationally efficient framework to convert a PD into a vector in $\mathbb{R}^n$, called a vectorized persistence block (VPB). We show that our representation possesses many of the desired properties of vector-based summaries such as stability with respect to input noise, low computational cost and flexibility. Through simulation studies, we demonstrate the effectiveness of VPBs in terms of performance and computational cost within various learning tasks, namely clustering, classification and change point detection.
翻译:在地形数据分析中,量化数据形状的一个常见方法是使用持久性图(PD)来量化数据形状。PD是使用代数表学工具计算出的多组点数,单位为$\mathb{R ⁇ 2美元。然而,这种多套结构限制了PD在应用中的效用。因此,近年来努力从PD提取信息化和高效的摘要,以扩大其用于机器学习任务的范围。我们提出了一个计算效率高的框架,将PD转换成以$\mathbb{R ⁇ n$为矢量的矢量,称为矢量化持久性块(VPB)。我们表明,我们的代表拥有基于矢量的摘要的许多预期特性,如输入噪音的稳定、低计算成本和灵活性。我们通过模拟研究,展示了VPB在各种学习任务(即集束、分类和改变点探测)中的性能和计算成本。