We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of $\tilde{O}(m/p^{1/\rho})$ where $m$ is the total size of the input relations, $p$ is the number of machines, $\rho$ is the join's fractional edge covering number, and $\tilde{O}(.)$ hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name {\em the isolated cartesian product theorem}) that provides fresh insight into the problem's mathematical structure. Our result implies that the {\em subgraph enumeration problem}, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.
翻译:在大规模平行计算(MPC)模型中,我们提出一个常数算法,用于评价每个输入关系都具有两个属性的自然连结。我们的算法达到一个 $\ tilde{O}(m/p ⁇ 1/\rho}) 的负载,其中百万美元是输入关系的总大小,$p$是机器的数量,$\rho$是组合的分数边缘覆盖数,$\tilde{O}(.)美元隐藏着一个多元参数。负载符合一个已知的较低链条,最多是一个多元系数。拟议算法的核心是一个新的理论(我们命名为#em孤立的碳酸盐产品标语 ), 它对问题的数学结构提供了新的洞察力。我们的结果是, ~em 子绘图查点问题}, 目标是报告一个不变大小子绘图模式的所有发生情况, 最理想地(最多是一个多元系数) 在 MPC 模型中解决 。