Community detection is the problem of identifying tightly connected clusters of nodes within a network. Efficient parallel algorithms for this play a crucial role in various applications, especially as datasets expand to significant sizes. The Label Propagation Algorithm (LPA) is commonly employed for this purpose due to its ease of parallelization, rapid execution, and scalability - however, it may yield internally disconnected communities. This technical report introduces GSL-LPA, derived from our parallelization of LPA, namely GVE-LPA. Our experiments on a system with two 16-core Intel Xeon Gold 6226R processors show that GSL-LPA not only mitigates this issue but also surpasses FLPA, igraph LPA, and NetworKit LPA by 55x, 10, 300x, and 5.8x, respectively, achieving a processing rate of 844M edges/s on a 3.8B edge graph. Additionally, GSL-LPA scales at a rate of 1.6x for every doubling of threads.
翻译:暂无翻译