Community detection is a fundamental problem in computational sciences with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity over different partitions of the network nodes. Using 80 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning maximum-modularity (optimal) partitions. We evaluate (1) the ratio of the algorithms' output modularity to the maximum modularity for each input graph, and (2) the maximum similarity between their output partition and any optimal partition of that graph. We compare eight existing heuristic algorithms against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 16.9% of the 80 graphs considered. Additionally, results on adjusted mutual information reveal substantial dissimilarity between the sub-optimal partitions and any optimal partition of the networks in our experiments. More importantly, our results show that near-optimal partitions are often disproportionately dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based heuristics for discovering communities: they rarely produce an optimal partition or a partition resembling an optimal partition. If modularity is to be used for detecting communities, exact or approximate optimization algorithms are recommendable for a more methodologically sound usage of modularity within its applicability limits.
翻译:社区检测是计算科学中的一个基本问题,具有广泛的应用于各种领域。最常用的方法是根据网络节点的不同分区设计的最大化模块度算法。使用来自广泛背景的80个真实和随机网络,我们研究当前启发式模块度最大化算法在返回最大模块度(最优)分区方面成功的程度。我们评估了(1)每个输入图形的算法输出模块度与最大模块度之间的比率以及(2)它们的输出分区和该图形的任何最优分区之间的最大相似度。我们将八种现有的启发式算法与通过全局最大化模块度实现的精确整数编程方法进行比较。平均基于模块度的启发式算法仅返回80个考虑的图形中的16.9%的最优分区。此外,关于调整后的互信息的结果显示,次优分区与我们实验中任何优化分区之间存在实质性的差异。更重要的是,我们的结果表明,接近最优分区往往与任何最优分区都不成比例不同。综上所述,我们的分析指出常用的基于模块度的启发式算法出现了一个关键限制,即他们很少生成一个最优的分区或类似最优的分区。如果模块度用于检测社区,请推荐使用精确或近似优化算法,以在模块度的适用限制范围内更加方法论地使用模块度。