Network homophily, the tendency of similar nodes to be connected, and transitivity, the tendency of two nodes being connected if they share a common neighbor, are conflated properties in network analysis, since one mechanism can drive the other. Here we present a generative model and corresponding inference procedure that is capable of distinguishing between both mechanisms. Our approach is based on a variation of the stochastic block model (SBM) with the addition of triadic closure edges, and its inference can identify the most plausible mechanism responsible for the existence of every edge in the network, in addition to the underlying community structure itself. We show how the method can evade the detection of spurious communities caused solely by the formation of triangles in the network, and how it can improve the performance of link prediction when compared to the pure version of the SBM without triadic closure.
翻译:网络同质,类似节点连接的倾向,以及两个节点如果共享一个共同的邻居而连接的过渡性趋势,在网络分析中是融合的属性,因为一个机制可以驱动另一个机制。在这里,我们提出了一个基因模型和相应的推论程序,能够区分两种机制。我们的方法基于随机区块模型(SBM)的变异,加上三重封闭边缘,它的推论可以确定除了基本社区结构本身之外,网络中每个边缘都存在的最可信的机制。我们展示了这种方法如何逃避仅仅由于网络三角结构的形成而导致的虚假社区被探测,以及它如何在与纯粹的SBM模式相比而无需三重封闭的情况下改进连接预测的性能。