Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of queries per sample scales linearly with the mixing time of the network, which can be prohibitive for large real-world networks. We propose a new method for sampling multiple nodes that bypasses the dependence in the mixing time by explicitly searching for less accessible components in the network. We test our approach on a variety of real-world and synthetic networks with up to tens of millions of nodes, demonstrating a query complexity improvement of up to $\times 20$ compared to the state of the art.
翻译:抽样随机节点是分析大规模网络的一个基本的算法原始,许多现代图形采矿算法都非常依赖它。我们考虑在有限的查询访问(在查询节点时显示其邻居群 ) 的情况下在网络中生成大量随机节点的任务。 在目前的方法中,根据长随机行走,每个样本的查询数量与网络的混合时间成线,这对大型现实世界网络来说可能是令人望而却步的。我们建议了一种新的方法,通过明确搜索网络中不易获取的部件,来抽样多个节点,绕过混合时间的依赖性。我们测试了我们对于有多达数千万个节点的各种真实世界和合成网络的做法,这表明与艺术状况相比,查询的复杂性提高了高达20美元。