Topological data analysis (TDA) studies the shape patterns of data. Persistent homology (PH) is a widely used method in TDA that summarizes homological features of data at multiple scales and stores this in persistence diagrams (PDs). As TDA is commonly used in the analysis of high dimensional data sets, a sufficiently large amount of PDs that allow performing statistical analysis is typically unavailable or requires inordinate computational resources. In this paper, we propose random persistence diagram generation (RPDG), a method that generates a sequence of random PDs from the ones produced by the data. RPDG is underpinned (i) by a parametric model based on pairwise interacting point processes for inference of persistence diagrams and (ii) by a reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithm for generating samples of PDs. The parametric model combines a Dirichlet partition to capture spatial homogeneity of the location of points in a PD and a step function to capture the pairwise interaction between them. The RJ-MCMC algorithm incorporates trans-dimensional addition and removal of points and same-dimensional relocation of points across samples of PDs. The efficacy of RPDG is demonstrated via an example and a detailed comparison with other existing methods is presented.
翻译:地形数据分析(TDA)研究数据的形状模式。持久性同族体(PH)是TDA广泛使用的一种方法,它总结了多种比例的数据的同质特征,并将其储存在持久性图表(PDs)中。由于TDA通常用于分析高维数据集,因此在分析高维数据集时,通常没有足够数量的PD, 进行统计分析所需的数据分析需要不相称的计算资源。在本文中,我们提议随机持久性图生成(RPDG)方法,从数据产生的数据中产生随机的PD序列。RPMDG是(一)基于对称互动点的参数模型,用以推断持久性图表的推断,(二)通过可逆跳跃的Markov链 Monte Carlo(RJ-MC)算法来生成PDS样本。参数模型的参数组合将Drichlet分区隔开以捕捉PD点位置的空间同质性,并用一个步骤函数函数函数来捕捉它们之间的相互作用。RPMMMC算法中包含点的反维增加和去除的点和同维度方法。