Community discovery is the general process of attaining assortative communities from a network: collections of nodes that are densely connected within yet sparsely connected to the rest of the network. While community discovery has been well studied, few such techniques exist for heterogeneous networks, which contain different types of nodes and possibly different connectivity patterns between the node types. In this paper, we introduce a framework called ECoHeN, which \textbf{e}xtracts \textbf{co}mmunities from a \textbf{he}terogeneous \textbf{n}etwork in a statistically meaningful way. Using a heterogeneous configuration model as a reference distribution, ECoHeN identifies communities that are significantly more densely connected than expected given the node types and connectivity of its membership. Specifically, the ECoHeN algorithm extracts communities one at a time through a dynamic set of iterative updating rules, is guaranteed to converge, and imposes no constraints on the type composition of extracted communities. To our knowledge this is the first discovery method that distinguishes and identifies both homogeneous and heterogeneous, possibly overlapping, community structure in a network. We demonstrate the performance of ECoHeN through simulation and in application to a political blogs network to identify collections of blogs which reference one another more than expected considering the ideology of its' members.
翻译:社区发现是从网络中获取不同社区的一般过程: 收集在网络中紧密相连但与网络其他部分连接很少的节点。 虽然社区发现已经研究周密, 但对于不同网络却很少存在这种技术, 包括不同类型的节点和节点类型之间可能不同的连接模式。 在本文中, 我们引入了一个名为 ECoHEN 的框架, 称为 extlebff{e}tract \ textbf{he} communities, 这个框架来自一个来自\ textb{he}tergenous legenous \ textbf{n}n} 的工作方式在统计上有意义。 使用一个混杂的配置模式作为参考分布, ECoHEN 发现, 不同网络的连接比预期的要密集得多。 具体地说, ECoHEN 算法通过一套动态的迭接更新规则, 保证对抽取社区的组成不加限制。 我们知道这是第一个区分和识别统一和混杂的发现方法, 可能相互重叠, 社区结构结构结构结构在政治网络中显示其预期的功能。 我们考虑通过另一个博客的网络的模拟, 将一个网络进行模拟。