Network embedding has attracted considerable research attention recently. However, the existing methods are incapable of handling billion-scale networks, because they are computationally expensive and, at the same time, difficult to be accelerated by distributed computing schemes. To address these problems, we propose RandNE, a novel and simple billion-scale network embedding method. Specifically, we propose a Gaussian random projection approach to map the network into a low-dimensional embedding space while preserving the high-order proximities between nodes. To reduce the time complexity, we design an iterative projection procedure to avoid the explicit calculation of the high-order proximities. Theoretical analysis shows that our method is extremely efficient, and friendly to distributed computing schemes without any communication cost in the calculation. We demonstrate the efficacy of RandNE over state-of-the-art methods in network reconstruction and link prediction tasks on multiple datasets with different scales, ranging from thousands to billions of nodes and edges.
翻译:最近,网络嵌入引起了相当大的研究关注。然而,现有方法无法处理10亿规模的网络,因为它们计算费用昂贵,同时难以通过分布式计算计划加速。为了解决这些问题,我们提议采用新型和简单的10亿规模的网络嵌入方法RandNE。具体地说,我们提议采用高斯随机投影法,将网络绘制成一个低维嵌入空间,同时保持节点之间的高度一致性。为减少时间复杂性,我们设计了一个迭代投影程序,以避免明确计算高阶近似。理论分析表明,我们的方法非常高效,在计算过程中,在没有任何通信成本的情况下,对分布式计算计划十分友好。我们展示了兰德尼在网络重建中相对于最新技术方法的功效,并将多个数据集的预测任务与不同尺度联系起来,范围从数千至数十亿个节点和边缘不等。