We present a new method for assessing and measuring homophily in networks whose nodes have categorical attributes, namely when the nodes of networks come partitioned into classes (colors). We probe this method in two different classes of networks: i) protein-protein interaction (PPI) networks, where nodes correspond to proteins, partitioned according to their functional role, and edges represent functional interactions between proteins ii) Pokec on-line social network, where nodes correspond to users, partitioned according to their age, and edges respresent friendship between users. Similarly to other classical and well consolidated approaches, our method compares the relative edge density of the subgraphs induced by each class with the corresponding expected relative edge density under a null model. The novelty of our approach consists in prescribing an endogenous null model, namely, the sample space of the null model is built on the input network itself. This allows us to give exact explicit expression for the z-score of the relative edge density of each class as well as other related statistics. The z-scores directly quantify the statistical significance of the observed homophily via Tchebycheff inequality. The expression of each z-score is entered by the network structure through basic combinatorial invariant such as the number of subgraphs with two spanning edges. Each z-score is computed in O(n + m) time for a network with n nodes and m edges. This leads to an overall efficient computational method for assesing homophily. We complement the analysis of homophily/heterophily by considering z-scores of the number of isolated nodes in the subgraphs induced by each class, that are computed in O(nm) time. Theoretical results are then exploited to show that, as expected, both the analyzed network classes are significantly homophilic with respect to the considered node properties.
翻译:在网络中,我们提出了一个评估和测量同质的新方法(在网络节点具有绝对属性的网络中,即当网络节点被分割成类(颜色)时,我们展示了一种新的方法。我们在两个不同的网络类别中检测了这种方法:一)蛋白质-蛋白互动(PPI)网络,其节点与蛋白质相对,根据其功能作用进行分割,边缘代表蛋白质之间的功能互动;二)Pokec在线网络,其节点与用户相对称,按其年龄进行分割,边缘代表用户之间的友谊。与其他经典和完善的方法一样,我们的方法将每个类别所观察到的子节点的相对边缘密度与在无模式下的相应预期相对边缘密度对比。我们方法的新颖在于设置一个内生的无边模型,即空模型的样本空间建于输入输入的输入网络本身。这让我们对每个类的相对边缘密度表示精确的Z,而其它相关的统计数据是直接量化的nriphyal-ireal 。我们通过Tchebychechechecheff 网络中观察到的直径直径直径端的直径端的直径直径,每个直径网络的直径都显示一个直径直径的轨道结构结构结构结构结构。每个直径直向每个直向的直径直系的直向。