Network-valued data are encountered in a wide range of applications and pose challenges in learning due to their complex structure and absence of vertex correspondence. Typical examples of such problems include classification or grouping of protein structures and social networks. Various methods, ranging from graph kernels to graph neural networks, have been proposed that achieve some success in graph classification problems. However, most methods have limited theoretical justification, and their applicability beyond classification remains unexplored. In this work, we propose methods for clustering multiple graphs, without vertex correspondence, that are inspired by the recent literature on estimating graphons -- symmetric functions corresponding to infinite vertex limit of graphs. We propose a novel graph distance based on sorting-and-smoothing graphon estimators. Using the proposed graph distance, we present two clustering algorithms and show that they achieve state-of-the-art results. We prove the statistical consistency of both algorithms under Lipschitz assumptions on the graph degrees. We further study the applicability of the proposed distance for graph two-sample testing problems.
翻译:网络价值数据在广泛的应用中遇到,并因其结构复杂和没有顶端对应物而在学习上构成挑战。这类问题的典型例子包括蛋白质结构和社交网络的分类或分组。提出了从图形内核到图形神经网络的各种方法,在图形分类问题上取得了一定的成功。然而,大多数方法理论上的理由有限,其超出分类范围的适用性仍未探讨。在这项工作中,我们建议了将多个图集分组的方法,而没有顶端对应物,这些方法受最近关于估算图形文献的启发 -- -- 与无限的顶端限制相对应的对称函数。我们建议了基于排序和移动图形估计数的新的图形距离。我们使用拟议的图形距离,提出了两种组合算法,并表明它们达到了最新的结果。我们证明了Lipschitz假设下的两种算法在图形度上的统计一致性。我们进一步研究了拟议的图式二模测试问题距离的适用性。