Given a graph stream, how can we estimate the number of triangles in it using multiple machines with limited storage? Specifically, how should edges be processed and sampled across the machines for rapid and accurate estimation? The count of triangles (i.e., cliques of size three) has proven useful in numerous applications, including anomaly detection, community detection, and link recommendation. For triangle counting in large and dynamic graphs, recent work has focused largely on streaming algorithms and distributed algorithms but little on their combinations for "the best of both worlds". In this work, we propose CoCoS, a fast and accurate distributed streaming algorithm for estimating the counts of global triangles (i.e., all triangles) and local triangles incident to each node. Making one pass over the input stream, COCOS carefully processes and stores the edges across multiple machines so that the redundant use of computational and storage resources is minimized. Compared to baselines, CoCoS is (a) Accurate: giving up to 39X smaller estimation error, (b) Fast: up to 10.4X faster, scaling linearly with the size of the input stream, and (c) Theoretically sound: yielding unbiased estimates.
翻译:根据图表流,我们如何使用存储量有限的多个机器来估计三角形的数量? 具体地说, 边缘应该如何在机器之间进行处理和取样, 以便快速和准确地估计? 三角形的计数( 大小为三的三角形) 已证明在许多应用中有用, 包括异常检测、 社区检测 和链接建议。 对于大型和动态图表中的三角形计数, 最近的工作主要集中在流算法和分布式算法上, 但对于“ 最佳世界” 来说, 它们的组合很少。 在这项工作中, 我们提议 CoCOS, 一个快速和准确的分布式流算法, 用于估算全球三角形( 即所有三角形) 和本地三角形对每个节点的计数。 在输入流中, COCOS 仔细处理并存储多个机器的边缘, 以便尽可能减少对计算和储存资源的多余使用。 与基线相比, CoCOS 是 (a) 准确的: 给予39X 较小的估计错误, (b) 快速: 快速到 10.44.X, 直线缩缩缩的计算结果( ) 和 。