In this paper, we consider the problem of clustering graph nodes and sparsifying graph edges over distributed graphs, when graph edges with possibly edge duplicates are observed at physically remote sites. Although edge duplicates across different sites appear to be beneficial at the first glance, in fact they could make the clustering and sparsification more complicated since potentially their processing would need extra computations and communications. We propose the first communication-optimal algorithms for two well-established communication models namely the message passing and the blackboard models. Specifically, given a graph on $n$ nodes with edges observed at $s$ sites, our algorithms achieve communication costs $\tilde{O}(ns)$ and $\tilde{O}(n+s)$ ($\tilde{O}$ hides a polylogarithmic factor), which almost match their lower bounds, $\Omega(ns)$ and $\Omega(n+s)$, in the message passing and the blackboard models respectively. The communication costs are asymptotically the same as those under non-duplication models, under an assumption on edge distribution. Our algorithms can also guarantee clustering quality nearly as good as that of centralizing all edges and then applying any standard clustering algorithm. Moreover, we perform the first investigation of distributed constructions of graph spanners in the blackboard model. We provide almost matching communication lower and upper bounds for both multiplicative and additive spanners. For example, the communication lower bounds of constructing a $(2k-1)$-spanner in the blackboard with and without duplication models are $\Omega(s+n^{1+1/k}\log s)$ and $\Omega(s+n^{1+1/k}\max\{1,s^{-1/2-1/(2k)}\log s\})$ respectively, which almost match the upper bound $\tilde{O}(s+n^{1+1/k})$ for both models.
翻译:在本文中, 我们考虑将图形节点组合起来, 并在分布式图表上对图形边缘进行垃圾化, 当物理边远地点观测到可能边缘复制的图形边缘。 虽然不同站点的边缘复制物在第一眼中似乎是有益的, 事实上它们可能会使组合和垃圾化更加复杂化更加复杂, 因为其处理可能需要额外的计算和通信。 我们为两种成熟的通信模式, 即信息传递和黑板模型, 提议第一个通信最优化的算法。 具体地说, 在信息传递和黑板模型中, 以美元为顶点, 我们的算法可以实现通信成本 $\ tilde{O} 美元和 $\ tilde{O} (n+) 美元。 事实上, 它们可以让 Otildreal + 双端的计算模型( $_ =_ + black) 和 黑板模型中的所有 。 通信成本成本可以和在不进行下端分析的 Odreal- slational 格式中, 运行一个正常的模型。