Community detection is a classic problem in network science with extensive applications in various fields. The most commonly used methods are the algorithms designed to maximize modularity over different partitions of the network nodes into communities. Using 80 real and random networks from a wide range of contexts, we investigate the extent to which current heuristic modularity maximization algorithms succeed in returning modularity-maximum (optimal) partitions. We evaluate (1) the ratio of their output modularity to the maximum modularity for each input graph and (2) the maximum similarity between their output partition and any optimal partition of that graph. Our computational experiments involve eight existing heuristic algorithms which we compare against an exact integer programming method that globally maximizes modularity. The average modularity-based heuristic algorithm returns optimal partitions for only 16.9% of the 80 graphs considered. Results on adjusted mutual information show considerable dissimilarity between the sub-optimal partitions and any optimal partitions of the graphs in our experiments. More importantly, our results show that near-optimal partitions tend to be disproportionally dissimilar to any optimal partition. Taken together, our analysis points to a crucial limitation of commonly used modularity-based algorithms for discovering communities: they rarely return an optimal partition or a partition resembling an optimal partition. Given this finding, developing an exact or approximate algorithm for modularity maximization is recommendable for a more methodologically sound usage of modularity in community detection.
翻译:社区检测是网络科学中一个典型的问题,在各个领域都有广泛的应用。 最常用的方法是设计各种算法, 以最大限度地扩大网络节点不同分区的模块化模块化模式; 使用来自广泛背景的80个真实和随机网络, 我们调查当前超光化模块化最大化算法在返回模块化最大( 最佳) 分区方面成功的程度。 我们评估 (1) 它们的输出模块化与每个输入图形的最大模块化比例的比例, (2) 它们的输出共享和该图形的任何最佳分区之间的最大相似性。 我们的计算实验涉及现有的八种超饱和性算法, 我们比对它进行对比, 与全球最大模块化的精确整数编程方法比较。 平均模块化基于超纯度算法算法的算法只为所考虑的80个图表的16.9% 。 调整后的共同信息结果显示, 亚优度的组合化模块化分区与我们实验中任何最优化的组合。 更重要的是, 我们的结果显示, 近优化的分区化建议偏差与任何最优化的分区化的比对最佳分区的比。</s>