We study fundamental graph problems such as graph connectivity, minimum spanning forest (MSF), and approximate maximum (weight) matching in a distributed setting. In particular, we focus on the Adaptive Massively Parallel Computation (AMPC) model, which is a theoretical model that captures MapReduce-like computation augmented with a distributed hash table. We show the first AMPC algorithms for all of the studied problems that run in a constant number of rounds and use only $O(n^\epsilon)$ space per machine, where $0 < \epsilon < 1$. Our results improve both upon the previous results in the AMPC model, as well as the best-known results in the MPC model, which is the theoretical model underpinning many popular distributed computation frameworks, such as MapReduce, Hadoop, Beam, Pregel and Giraph. Finally, we provide an empirical comparison of the algorithms in the MPC and AMPC models in a fault-tolerant distriubted computation environment. We empirically evaluate our algorithms on a set of large real-world graphs and show that our AMPC algorithms can achieve improvements in both running time and round-complexity over optimized MPC baselines.
翻译:我们研究基本图表问题,如图表连接、最小覆盖森林(MSF)和分布式环境中的近似最大(重量)匹配。我们特别侧重于适应性大规模平行平行计算模型(AMPC)模型,这是一个理论模型,它捕捉了配有分布式散装散装散装散货表的类似于地图的计算方法。我们展示了所有在连续数轮运行的所有研究问题的第一个AMPC算法,并且只使用每台机器的$O(n ⁇ epsilon)空间,其中0.0 < \epsilon < 1美元。我们从经验上评价了我们关于大型真实世界图模型的算法,以及MPC模型中最著名的结果。MPC模型是支持许多流行的分布式计算框架的理论模型,例如Mapeduce、Hadoop、Beam、Pregel和Giraph。最后,我们从经验上比较了在不易分解的计算环境中的MPC和AMPC模型中的算法。我们实证地评估了一套大型真实世界图表的算法,并展示了我们的AMPC在超时程和MC最优化的基线上可以实现改进。