Community detection is one of the most important methodological fields of network science, and one which has attracted a significant amount of attention over the past decades. This area deals with the automated division of a network into fundamental building blocks, with the objective of providing a summary of its large-scale structure. Despite its importance and widespread adoption, there is a noticeable gap between what is arguably the state-of-the-art and the methods that are actually used in practice in a variety of fields. Here we attempt to address this discrepancy by dividing existing methods according to whether they have a "descriptive" or an "inferential" goal. While descriptive methods find patterns in networks based on context-dependent notions of community structure, inferential methods articulate generative models, and attempt to fit them to data. In this way, they are able to provide insights into the mechanisms of network formation, and separate structure from randomness in a manner supported by statistical evidence. We review how employing descriptive methods with inferential aims is riddled with pitfalls and misleading answers, and thus should be in general avoided. We argue that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred. We attempt to dispel some myths and half-truths often believed when community detection is employed in practice, in an effort to improve both the use of such methods as well as the interpretation of their results.
翻译:社区探测是网络科学最重要的方法领域之一,在过去几十年中吸引了大量关注。这个领域涉及网络自动分为基本构件,目的是提供其大规模结构的概况。尽管其重要性和广泛采用,但社区探测与各个领域实际采用的方法之间有明显差距。我们试图通过区分现有方法,根据它们是否具有“描述性”或“推断性”目标来消除这种差异。虽然描述性方法在网络中找到基于基于环境的共同体结构概念的模式,但推断性方法阐明基因模型,并试图将其与数据相适应。这样,它们能够以统计证据支持的方式,对网络形成机制提供洞察力,并与随机性分开。我们审查如何使用描述性方法来弥补错误和误导性答案,从而普遍避免。我们指出,在人们通常会比较倾向于采用基于社区结构的网络模式,在更清晰的科学方法下,在人们相信的半数努力中,我们通常会采用更精确的方法,在更精确的科学努力中,从而实现更稳健的检测结果。我们通常会采用这种方法,在更精确的情况下,在更精确的情况下,我们更倾向于采用这种方法,在更精确地进行更精确地研究时,在更精确地解释。