The distance function to a compact set plays a crucial role in the paradigm of topological data analysis. In particular, the sublevel sets of the distance function are used in the computation of persistent homology -- a backbone of the topological data analysis pipeline. Despite its stability to perturbations in the Hausdorff distance, persistent homology is highly sensitive to outliers. In this work, we develop a framework of statistical inference for persistent homology in the presence of outliers. Drawing inspiration from recent developments in robust statistics, we propose a $\textit{median-of-means}$ variant of the distance function ($\textsf{MoM Dist}$), and establish its statistical properties. In particular, we show that, even in the presence of outliers, the sublevel filtrations and weighted filtrations induced by $\textsf{MoM Dist}$ are both consistent estimators of the true underlying population counterpart, and their rates of convergence in the bottleneck metric are controlled by the fraction of outliers in the data. Finally, we demonstrate the advantages of the proposed methodology through simulations and applications.
翻译:远程功能在地形数据分析范式中发挥着关键作用。 特别是, 远程函数的子级数据集被用于计算持久性同族体 -- -- 地形数据分析管道的支柱。 尽管在Hausdorf 距离内扰动稳定, 持久性同质体对外部线非常敏感。 在这项工作中, 我们开发了一个在外部线面前对持久性同质进行统计推论的框架。 我们从可靠统计数据的最新动态中汲取了灵感, 我们建议使用一个( $\ textit{ mident- obus} $) 的远程函数变量( $\ textsf{MMDist} $ ), 并确立其统计属性。 特别是, 我们显示, 即使在外部线存在, 由 $\ textsf{MDist} 引发的子级过滤和加权过滤, 也是对真正基础人口对应方的一致估计, 并且它们进入瓶内公体的趋同率率由数据外部线部分控制。 最后, 我们通过模拟和应用展示了拟议方法的优势。