Random walks are a fundamental primitive used in many machine learning algorithms with several applications in clustering and semi-supervised learning. Despite their relevance, the first efficient parallel algorithm to compute random walks has been introduced very recently (Lacki et al.). Unfortunately their method has a fundamental shortcoming: their algorithm is non-local in that it heavily relies on computing random walks out of all nodes in the input graph, even though in many practical applications one is interested in computing random walks only from a small subset of nodes in the graph. In this paper, we present a new algorithm that overcomes this limitation by building random walk efficiently and locally at the same time. We show that our technique is both memory and round efficient, and in particular yields an efficient parallel local clustering algorithm. Finally, we complement our theoretical analysis with experimental results showing that our algorithm is significantly more scalable than previous approaches.
翻译:随机行走是许多机器学习算法中使用的基本原始方法,许多机器学习算法中有一些组合和半监督学习的应用。 尽管它们具有相关性,但最近才引入了计算随机行走的第一个高效平行算法(Lacki等人 ) 。 不幸的是,它们的方法有一个根本性的缺陷:它们的算法是非本地的,因为它严重依赖计算随机从输入图中的所有节点中走出来,尽管在许多实际应用中,人们只对从图中的小节点子中随机行走感兴趣。在本文中,我们提出了一个新的算法,通过同时建立随机行走和本地行走来克服这一限制。我们展示了我们的技术既具有记忆力又具有圆形效率,特别是产生了一种高效的平行本地组算法。最后,我们用实验性分析来补充我们的理论分析,表明我们的算法比以往的方法要大得多可伸缩。