While there has been much interest in adapting conventional clustering procedures---and in higher dimensions, persistent homology methods---to directed networks, little is known about the convergence of such methods. In order to even formulate the problem of convergence for such methods, one needs to stipulate a reasonable model for a directed network together with a flexible sampling theory for such a model. In this paper we propose and study a particular model of directed networks, and use this model to study the convergence of certain hierarchical clustering and persistent homology methods that accept any matrix of (possibly asymmetric) pairwise relations as input and produce dendrograms and persistence barcodes as outputs. We show that as points are sampled from some probability distribution, the output of each method converges almost surely to a dendrogram/barcode depending on the structure of the distribution.
翻译:虽然人们对修改传统的集群程序 -- -- 和更高层面 -- -- 具有很大兴趣,但持续同系方法 -- -- 与定向网络的趋同程度却知之甚少,但对于这些方法的趋同程度却知之甚少。为了为这些方法拟订趋同问题,人们需要为定向网络规定一个合理的模式,同时对这种模式提出灵活的抽样理论。在本文件中,我们提议并研究一个特定的方向网络模式,并利用这一模式研究某些等级集群和持续同系方法的趋同性方法的趋同程度,这些方法接受任何(可能不对称的)双向关系矩阵作为输入,并产生成形体和持久性条形码作为产出。我们表明,由于从某种概率分布中抽取的点数,每一种方法的产出几乎必然会根据分布结构而汇合成一个登德罗格/条码。