Detecting the dimensionality of graphs is a central topic in machine learning. While the problem has been tackled empirically as well as theoretically, existing methods have several drawbacks. On the one hand, empirical tools are computationally heavy and lack theoretical foundation. On the other hand, theoretical approaches do not apply to graphs with heterogeneous degree distributions, which is often the case for complex real-world networks. To address these drawbacks, we consider geometric inhomogeneous random graphs (GIRGs) as a random graph model, which captures a variety of properties observed in practice. These include a heterogeneous degree distribution and non-vanishing clustering coefficient, which is the probability that two random neighbours of a vertex are adjacent. In GIRGs, $n$ vertices are distributed on a $d$-dimensional torus and weights are assigned to the vertices according to a power-law distribution. Two vertices are then connected with a probability that depends on their distance and their weights. Our first result shows that the clustering coefficient of GIRGs scales inverse exponentially with respect to the number of dimensions, when the latter is at most logarithmic in $n$. This gives a first theoretical explanation for the low dimensionality of real-world networks observed by Almagro et. al. [Nature '22]. Our second result is a linear-time algorithm for determining the dimensionality of a given GIRG. We prove that our algorithm returns the correct number of dimensions with high probability when the input is a GIRG. As a result, our algorithm bridges the gap between theory and practice, as it not only comes with a rigorous proof of correctness but also yields results comparable to that of prior empirical approaches, as indicated by our experiments on real-world instances.
翻译:检测图形的维度是机器学习的一个中心主题。 虽然问题已经从经验上和理论上解决了, 但现有的方法有一些缺点。 一方面, 实验工具在计算上很重, 缺乏理论基础。 另一方面, 理论方法并不适用于具有不同度分布的图表, 这在复杂的现实世界网络中通常是这样的情况。 为了解决这些缺陷, 我们将几何不均匀的随机图( Girgs) 视为随机图模型, 它捕捉了实践中观察到的各种属性。 其中包括不同度分布和非损耗组集系数。 一方面, 实验工具在计算上非常重, 缺乏理论基础基础基础基础。 在GIRGs 中, 美元垂直值分布在以美元为维度分布上, 重则根据权力法分布, 将两面的随机随机随机随机随机图( Girgs) 作为随机数, 这取决于它们的距离和重量。 我们的第一个结果显示, GIRGs 的基值比例在数值上, 不以指数为指数值的数值值, 也以正值为正值的数值 。