Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph $G=(V, E)$ with $n$ vertices, $m$ edges and $T$ triangles, our first result is an algorithm that outputs a $(1+\varepsilon)$-approximation to $T$, with asymptotically \emph{optimal round and total space complexity} provided any $S \geq \max{(\sqrt m, n^2/m)}$ space per machine and assuming $T=\Omega(\sqrt{m/n})$. Our result gives a quadratic improvement on the bound on $T$ over previous works. We also provide a simple extension of our result to counting \emph{any} subgraph of $k$ size for constant $k \geq 1$. Our second result is an $O_{\varepsilon}(\log \log n)$-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the \emph{arboricity} $\alpha$ of the input graph. We extend this result to exactly counting $k$-cliques for any constant $k$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$ can be implemented in the MPC model in total space.
翻译:在过去20年中,分布式模拟平行计算框架,如MapReduce、Hadoop、Spark和Dryad等,随着大型网络数据集的日益普及,已获得显著的受欢迎程度。质量平行计算模型(MPC)是用于研究这些框架中的图表算法的脱fato标准,在分析大规模图表时,其主要算法挑战集中在设计既可缩放又准确的方法上。考虑到一个G=(V)、E(美元)的图形,有1美元垂直值、1美元边缘值和1美元三角值,我们的第一个结果是一个算法,输出$(1 ⁇ varepsilon) $(美元) 和美元(美元) 美元=(美元) 美元=(美元) 美元=(美元) 美元=(美元) 美元=(美元) 美元=(美元) 美元=(美元) 美元=(美元)。