Community detection is a fundamental network-analysis primitive with a variety of applications in diverse domains. Although the modularity introduced by Newman and Girvan (2004) has widely been used as a quality function for community detection, it has some drawbacks. The modularity density introduced by Li et al. (2008) is known to be an effective alternative to the modularity, which mitigates one of the drawbacks called the resolution limit. A large body of work has been devoted to designing exact and heuristic methods for modularity density maximization, without any computational complexity analysis. In this study, we investigate modularity density maximization from both algorithmic and computational complexity aspects. Specifically, we first accelerate column generation for the modularity density maximization problem. To this end, we point out that the auxiliary problem appearing in column generation can be viewed as a dense subgraph discovery problem. Then we employ a well-known strategy for dense subgraph discovery, called the greedy peeling, for approximately solving the auxiliary problem. Moreover, we reformulate the auxiliary problem to a sequence of $0$--$1$ linear programming problems, enabling us to compute its optimal value more efficiently and to get more diverse columns. Computational experiments using a variety of real-world networks demonstrate the effectiveness of our proposed algorithm. Finally, we show the NP-hardness of a slight variant of the modularity density maximization problem, where the output partition has to have two or more clusters, as well as showing the NP-hardness of the auxiliary problem.
翻译:社区检测是一个基本的网络分析原始,在不同领域的应用多种多样。尽管Newman和Girvan(2004年)引入的模块化密度已被广泛用作社区检测的一种质量功能,但它有一些缺点。Li等人(2008年)引入的模块密度被认为是模块性的一种有效替代,它缓解了一个称为解析极限的缺陷。大量工作致力于设计模块性密度最大化的精确和超常方法,而没有任何计算复杂性分析。在这项研究中,我们从算法和计算复杂性两个方面对模块性密度最大化进行了调查。具体地说,我们首先加快模块性密度最大化问题的柱子生成。为此,我们指出,列生成中的辅助问题可被视为一个密集的子成份发现问题。然后,我们采用了一个众所周知的密集子谱发现策略,要求贪婪剥离,以大致解决辅助性问题。此外,我们将辅助问题改写成一个费用-1美元线性编程问题序列,使我们能够更高效地配置其优化的硬性值,并最后用更多样化的轨迹显示我们的拟议的细度矩阵的精度。