We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of $\tilde{O}(m/p^{1/\rho})$ where $m$ is the total size of the input relations, $p$ is the number of machines, $\rho$ is the join's fractional edge covering number, and $\tilde{O}(.)$ hides a polylogarithmic factor. The load matches a known lower bound up to a polylogarithmic factor. At the core of the proposed algorithm is a new theorem (which we name the "isolated cartesian product theorem") that provides fresh insight into the problem's mathematical structure. Our result implies that the subgraph enumeration problem, where the goal is to report all the occurrences of a constant-sized subgraph pattern, can be settled optimally (up to a polylogarithmic factor) in the MPC model.
翻译:在大规模平行计算模型中,我们提出一个常数算法,用于评价每个输入关系都具有两个属性的自然连接。我们的算法达到一个 $\ tilde{O}(m/p ⁇ 1/\rho}) 的负载,其中百万美元是输入关系的总大小,$p$是机器的数量,$\rho$是组合的分边缘覆盖数字,$\tilde{O}(.)美元隐藏一个多元系数。负数匹配一个已知的较低绑定,最多是一个多元系数。在提议的算法的核心是一个新的理论(我们称之为“孤立的碳酸盐产物定理 ” ), 它对问题的数学结构提供了新的洞察力。我们的结果意味着子绘图问题, 目的是报告恒定规模子绘图模式的所有发生情况, 可以在MPC 模型中以最优化的方式解决( 最高为多元系数 ) 。