Cantelli不平等的多维扩展 (Network homophily via multi-dimensional extensions of Cantelli's inequality)

Homophily is the principle whereby "similarity breeds connections". We give a quantitative formulation of this principle within networks. We say that a network is homophillic with respect to a given labeled partition of its vertices, when the classes of the partition induce subgraphs that are significantly denser than what we expect under a random labeled partition into classes maintaining the same cardinalities (type). This is the recently introduced \emph{random coloring model} for network homophily. In this perspective, the vector whose entries are the sizes of the subgraphs induced by the corresponding classes, is viewed as the observed outcome of the random vector described by picking labeled partitions at random among partitions with the same type.\,Consequently, the input network is homophillic at the significance level $\alpha$ whenever the one-sided tail probability of observing an outcome at least as extreme as the observed one, is smaller than $\alpha$. Clearly, $\alpha$ can also be thought of as a quantifier of homophily in the scale $[0,1]$. Since, as we show, even approximating this tail probability is an NP-hard problem, we resort multidimensional extensions of classical Cantelli's inequality to bound $\alpha$ from above. This upper bound is the homophily index we propose. It requires the knowledge of the covariance matrix of the random vector, which was not previously known within the random coloring model. In this paper we close this gap by computing the covariance matrix of subgraph sizes under the random coloring model. Interestingly, the matrix depends on the input partition only through its type and on the network only through its degrees. Furthermore all the covariances have the same sign and this sign is a graph invariant. Plugging this structure into Cantelli's bound yields a meaningful, easy to compute index for measuring network homophily.

翻译：等离子是“ 相似性 ” 的原理。我们给出了网络内此原则连接的定量配方。我们表示, 当分区的分类引出比我们所观察到的更稠密的分层在维持相同基点( 类型) 的分类中比我们所期望的更稠密的分层。这是最近为网络同质( 类型) 引入的\ emph{ 兰度颜色模型。从这个角度看, 一个网络的矢量, 其分层大小与相应分类的分层大小相同, 被看成是随机矢量的随机矢量的观测结果。我们说, 在相同类型的分区间点中, 当单向尾端观察结果的可能性比所观察到的要小于 $alpha 。显然, $alphapha$ 值也可以被想象成在相应等级的 $10, 1 的矢量的矢量的矢量, 我们所知道的数的数色色色色的值, 的线质结构。我们所知道的的直径直径直径的直径的的直径的质质质质质质质质质质的质, 。的的的质质质的质质的质质质质的质质质质质质的的的的质的质质的质。。。。。的的的的的的的的的的的质的的的的的的的的的的的的的的的的的的的的的的的的的的的的的质的的的的的的的的的的的的的的的的的的的质质的质的的

相关内容