大型图表问题分布式计算 (Distributed Computation of Large-scale Graph Problems)

Motivated by the increasing need for fast distributed processing of large-scale graphs such as the Web graph and various social networks, we study a message-passing distributed computing model for graph processing and present lower bounds and algorithms for several graph problems. This work is inspired by recent large-scale graph processing systems (e.g., Pregel and Giraph) which are designed based on the message-passing model of distributed computing. Our model consists of a point-to-point communication network of $k$ machines interconnected by bandwidth-restricted links. Communicating data between the machines is the costly operation (as opposed to local computation). The network is used to process an arbitrary $n$-node input graph (typically $n \gg k > 1$) that is randomly partitioned among the $k$ machines (a common implementation in many real world systems). Our goal is to study fundamental complexity bounds for solving graph problems in this model. We present techniques for obtaining lower bounds on the distributed time complexity. Our lower bounds develop and use new bounds in random-partition communication complexity. We first show a lower bound of $\Omega(n/k)$ rounds for computing a spanning tree (ST) of the input graph. This result also implies the same bound for other fundamental problems such as computing a minimum spanning tree (MST). We also show an $\Omega(n/k^2)$ lower bound for connectivity, ST verification and other related problems. We give algorithms for various fundamental graph problems in our model. We show that problems such as PageRank, MST, connectivity, and graph covering can be solved in $\tilde{O}(n/k)$ time, whereas for shortest paths, we present algorithms that run in $\tilde{O}(n/\sqrt{k})$ time (for $(1+\epsilon)$-factor approx.) and in $\tilde{O}(n/k)$ time (for $O(\log n)$-factor approx.) respectively.

翻译：由于越来越需要快速分布处理大型图表,例如网络图表和各种社交网络,我们研究一个用于图形处理的电文传递分布式计算模型,并针对若干图形问题提出较低的界限和算法。这项工作受到最近大规模图形处理系统(例如Pregel和Giraph)的启发,这些系统是根据分布式计算的信息传递模式设计的。我们的模型包括一个点到点的通信网络,用带宽限制的链接连接到(k$)的机器。机器之间的通信数据是昂贵的操作(相对于本地计算)。这个网络用来处理任意的美元-node输入图(通常为$\gg k > 1美元),而这种系统是随机分割的 $(许多真实世界系统中的通用实施模式) 。我们的目标是研究基本的复杂性,解决这个模型中的图表问题。我们提出在分布式时间复杂性上获取较低的约束技术。我们较低的约束数据是开发并使用任意分割式的美元通信操作。我们首先在Ordeal-al 时间里显示一个最短的 Ordeal,这是一个最起码的运行的路径。