This paper investigates scalable parallelisation of state-of-the-art cycle enumeration algorithms by Johnson and Read-Tarjan along with their applications to temporal graphs. We provide a comprehensive theoretical analysis of various parallel versions of these algorithms and evaluate their performance on multi-core processors. We show that a straightforward coarse-grained parallelisation approach is not scalable and suffers from load imbalance issues. To eliminate the load imbalance, we modify the Johnson and the Read-Tarjan algorithms to exploit finer-grained parallelism. We show that our fine-grained parallel Read-Tarjan algorithm is theoretically work efficient -- i.e., it does no more work than its serial version. However, our fine-grained parallel Johnson algorithm does not share this property. Yet, in practice, our fine-grained parallel Johnson algorithm outperforms our fine-grained parallel Read-Tarjan algorithm. In any case, both of our contributed fine-grained parallel algorithms are scalable, meaning that they can effectively utilise an increasing number of software threads, which we prove theoretically and demonstrate through extensive experiments. On a cluster of multi-core CPUs with $256$ physical cores that can execute $1024$ simultaneous threads, our fine-grained parallel Johnson and Read-Tarjan algorithms are respectively up to $435\times$ and $470\times$ faster than their single-threaded versions. On the same compute cluster, our fine-grained parallel algorithms are on average an order of magnitude faster than their coarse-grained parallel counterparts.
翻译:本文调查了约翰逊和《阅读-塔然》与时间图表应用中最先进的循环计算算法的可扩展平行法。 我们对这些算法的各种平行版本提供了全面的理论分析,并评估了这些算法在多核心处理器中的性能。 我们显示,一个简单粗略的粗略平行法不是可缩放的,而且有负负不平衡问题。 为了消除负载不平衡,我们修改了约翰逊和《阅读-塔然》的算法,以利用细微的平行法。 我们显示,我们精细的平行的 " 阅读-塔然 " 算法在理论上是有效的 -- -- 也就是说,它不会比其序列版本做更多的工作。 但是,我们精细细的 " 粗粗粗粗的 " 平行平行法 " 方法比我们微的 " 阅读-塔然平行算法 " 高。 无论如何,我们贡献的精细的平行算法是平行的平行算法。 也就是说,它们可以有效地利用不断增长的 " 基价 " 美元 " 的 ",也就是说,我们用的是 " 基数的 " 基数 " 基数 " 和 " 10美元 " 。