Network complexity has been studied for over half a century and has found a wide range of applications. Many methods have been developed to characterize and estimate the complexity of networks. However, there has been little research with statistical guarantees. In this paper, we develop a statistical theory of graph complexity in a general model of random graphs, the so-called graphon model. Given a graphon, we endow the latent space of the nodes with the neighborhood distance that measures the propensity of two nodes to be connected with similar nodes. Our complexity index is then based on the covering number and the Minkowski dimension of (a purified version of) this metric space. Although the latent space is not identifiable, these indices turn out to be identifiable. This notion of complexity has simple interpretations on popular examples of random graphs: it matches the number of communities in stochastic block models; the dimension of the Euclidean space in random geometric graphs; the regularity of the link function in H\"older graphon models. From a single observation of the graph, we construct an estimator of the neighborhood-distance and show universal non-asymptotic bounds for its risk, matching minimax lower bounds. Based on this estimated distance, we compute the corresponding covering number and Minkowski dimension and we provide optimal non-asymptotic error bounds for these two plug-in estimators.
翻译:半个多世纪以来对网络的复杂性进行了研究,并发现了广泛的应用。许多方法已经开发出来,对网络的复杂性进行了定性和估计。然而,在统计保障方面研究很少。在本文中,我们在随机图形的一般模型中开发了图形复杂性的统计理论。在所谓的图形模型中,我们开发了一个普通的随机图形模型(所谓的图形模型)中,我们开发了一个图形复杂性的统计理论。在一个图形中,我们将节点的潜伏空间与测量两个节点与类似节点相连接的倾向的相距缩小。然后,我们复杂的指数基于这个计量空间的覆盖数和 Minkowski 的维度(一个纯化版) 。虽然潜伏空间无法识别,但这些指数却可以被识别出来。这个复杂的概念对随机图形的流行例子有简单的解释:它与随机的区块模型中的社区数量相匹配;Euclidean空间的尺寸;测量 H\"older 图形模型中的链接功能的规律性。我们通过对图形的单一观察,我们为这些相邻端距离和直径的最小基数据库和直径直径直径对等的图像提供了两个估计。