并查集的可证有效率和内部确定性并行算法 (Provably-Efficient and Internally-Deterministic Parallel Union-Find)

Determining the degree of inherent parallelism in classical sequential algorithms and leveraging it for fast parallel execution is a key topic in parallel computing, and detailed analyses are known for a wide range of classical algorithms. In this paper, we perform the first such analysis for the fundamental Union-Find problem, in which we are given a graph as a sequence of edges, and must maintain its connectivity structure under edge additions. We prove that classic sequential algorithms for this problem are well-parallelizable under reasonable assumptions, addressing a conjecture by [Blelloch, 2017]. More precisely, we show via a new potential argument that, under uniform random edge ordering, parallel union-find operations are unlikely to interfere: $T$ concurrent threads processing the graph in parallel will encounter memory contention $O(T^2 \cdot \log |V| \cdot \log |E|)$ times in expectation, where $|E|$ and $|V|$ are the number of edges and nodes in the graph, respectively. We leverage this result to design a new parallel Union-Find algorithm that is both internally deterministic, i.e., its results are guaranteed to match those of a sequential execution, but also work-efficient and scalable, as long as the number of threads $T$ is $O(|E|^{\frac{1}{3} - \varepsilon})$, for an arbitrarily small constant $\varepsilon > 0$, which holds for most large real-world graphs. We present lower bounds which show that our analysis is close to optimal, and experimental results suggesting that the performance cost of internal determinism is limited.

翻译：在经典的顺序算法中确定内在并行性的程度并利用其进行快速并行执行是并行计算的一个关键领域，对于各种经典算法已经有了详细的分析。在本文中，我们对基本的并查集问题进行了第一个这样的分析，其中给定一个图形作为边的序列，并且必须在添加边时维护其连通性结构。我们证明了这个问题的经典顺序算法在合理的假设条件下是易于并行化的，从而解决了 [Blelloch，2017] 的一项猜测。更准确地说，我们通过一个新的潜力论证表明，在统一随机边排序下，并查集并行操作不太可能互相干扰：$T$ 并发线程并行处理图形将在期望中遇到内存争用 $O(T^2 \cdot \log |V| \cdot \log |E|)$ 次，其中 $|E|$ 和 $|V|$ 分别是图形中的 edge 和 node 数量。我们利用这个结果设计了一种新的并行并查集算法，既具有内部确定性，即其结果保证与顺序执行相匹配，又具有工作效率和可伸缩性，在线程数 $T$ 为 $O(|E|^{\frac{1}{3} - \varepsilon})$，对于任意小的常数 $\varepsilon > 0$ 情况都是成立的，这适用于大多数大型实际图形。我们提供了证明表明我们的分析接近于最优，并提供了实验结果，表明内部确定性的性能成本是有限的。