Reliably learning group structure among nodes in network data is challenging in modern applications. We are motivated by covert networks encoding relationships among criminals. These data are subject to measurement errors and exhibit a complex combination of an unknown number of core-periphery, assortative and disassortative structures that may unveil the internal architecture of the criminal organization. The coexistence of such noisy block structures limits the reliability of community detection algorithms routinely applied to criminal networks, and requires extensions of model-based solutions to realistically characterize the node partition process, incorporate information from node attributes, and provide improved strategies for estimation, uncertainty quantification, model selection and prediction. To address these goals, we develop a novel class of extended stochastic block models (ESBM) that infer groups of nodes having common connectivity patterns via Gibbs-type priors on the partition process. This choice encompasses several realistic priors for criminal networks, covering solutions with fixed, random and infinite number of possible groups, and facilitates inclusion of node attributes in a principled manner. Among the new alternatives in our class, we focus on the Gnedin process as a realistic prior that allows the number of groups to be finite, random and subject to a reinforcement process coherent with the modular structures in organized crime. A collapsed Gibbs sampler is proposed for the whole ESBM class, and refined strategies for estimation, prediction, uncertainty quantification and model selection are outlined. ESBM performance is illustrated in realistic simulations and in an application to an Italian Mafia network, where we learn key block patterns revealing a complex hierarchical structure of the organization, mostly hidden from state-of-the-art alternative solutions.
翻译:网络数据节点之间的重新学习群落结构在现代应用中具有挑战性。我们受到隐蔽网络编码罪犯关系的激励。这些数据会受到测量错误的影响,并呈现出一个复杂的组合,其核心周界、分流和分解结构数目未知,可能揭开犯罪组织的内部结构。这种吵闹的区块结构共存,限制了通常适用于犯罪网络的社区检测算法的可靠性,需要扩展基于模型的解决方案,以现实地描述节点分隔进程,纳入来自节点属性的信息,并提供更好的估计、不确定性量化、模型选择和预测战略。为了实现这些目标,我们开发了新型的扩大的分流结构(ESBM),通过Gibbs型结构,推导出具有共同连接模式的节点组合。这种选择包括犯罪网络的若干现实的前奏,涵盖固定、随机和无限数量的可能群体,并便利将节点属性纳入有原则性的方式。在我们这一类新的替代方案中,我们侧重于Gnedinin进程,作为现实的估算、不确定性量化和预测战略,使得在准确的等级结构中,一个稳定的组织结构结构中,一个稳定的、一个不断的排序的学习,一个结构,一个结构的升级的排序的排序的学习,一个结构,一个结构的升级的升级的周期的周期的周期的周期的周期的周期性评估是整个的周期的周期的周期的周期的周期的周期的周期性评估,一个结构,一个结构的学习,一个结构的排序的排序的周期的周期的周期的周期的周期的周期的周期的周期的周期性评估,一个结构的周期的周期的周期的周期性评估。