We address the problem of estimating topological features from data in high dimensional Euclidean spaces under the manifold assumption. Our approach is based on the computation of persistent homology of the space of data points endowed with a sample metric known as Fermat distance. We prove that such metric space converges almost surely to the manifold itself endowed with an intrinsic metric that accounts for both the geometry of the manifold and the density that produces the sample. This fact implies the convergence of the associated persistence diagrams. The use of this intrinsic distance when computing persistent homology presents advantageous properties such as robustness to the presence of outliers in the input data and less sensitiveness to the particular embedding of the underlying manifold in the ambient space. We use these ideas to propose and implement a method for pattern recognition and anomaly detection in time series, which is evaluated in applications to real data.
翻译:我们根据多重假设,从高维欧几里德空间的数据中估算地貌特征的问题。我们的方法是基于计算具有称为Fermat距离的抽样测量标准的数据点空间的持久性同系物。我们证明,这种测量空间几乎肯定会与具有内含测量参数的多元体相融合,该参数既能计算出各种元体的几何特征,又能计算出生成样本的密度。这一事实意味着相关持久性图的趋同。在计算持久性同系物时使用这种内在距离具有有利的特性,例如输入数据中存在外部线的强健性,对环境空间中嵌入的深层元体的具体嵌入不那么敏感。我们利用这些想法提出并实施一种在时间序列中识别模式和发现异常现象的方法,该方法在应用到真实数据时得到评估。