Persistence diagrams (PDs) are the most common descriptors used to encode the topology of structured data appearing in challenging learning tasks; think e.g. of graphs, time series or point clouds sampled close to a manifold. Given random objects and the corresponding distribution of PDs, one may want to build a statistical summary-such as a mean-of these random PDs, which is however not a trivial task as the natural geometry of the space of PDs is not linear. In this article, we study two such summaries, the Expected Persistence Diagram (EPD), and its quantization. The EPD is a measure supported on R 2 , which may be approximated by its empirical counterpart. We prove that this estimator is optimal from a minimax standpoint on a large class of models with a parametric rate of convergence. The empirical EPD is simple and efficient to compute, but possibly has a very large support, hindering its use in practice. To overcome this issue, we propose an algorithm to compute a quantization of the empirical EPD, a measure with small support which is shown to approximate with near-optimal rates a quantization of the theoretical EPD.
翻译:常识图表(PDs)是用来对挑战性学习任务中出现的结构性数据的表层进行编码的最常用描述符;例如,在图表、时间序列或点云的样本中,云层的分布接近一个方块。考虑到随机天体和相应的PDs分布,人们可能希望建立一个统计摘要,例如这些随机PD的平均值,但这不是一项微不足道的任务,因为PDs空间的自然几何测量不是线性的。在本篇文章中,我们研究了两个这样的摘要,即预期常识图(EPD)及其量化。EDDS是一种在R 2 上得到支持的尺度,可能由经验性对应方加以比较。我们证明,这个估计符从微缩角度看是最佳的,在大型模型中具有相近的相近度趋同率。经验性EDDD是简单和高效的计算,但可能有很大的支持,从而妨碍其实际应用。为了克服这一问题,我们建议一种算法来对实验性EPDD的四分位法进行计,一种小的理论支持度的尺度将显示为小比例。