Many popular models from the networks literature can be viewed through a common lens of contingency tables on network dyads, resulting in \emph{log-linear ERGMs}: exponential family models for random graphs whose sufficient statistics are linear on the dyads. We propose a new model in this family, the \emph{$p_1$-SBM}, which combines node and group effects common in network formation mechanisms. In particular, it is a generalization of several well-known ERGMs including the stochastic blockmodel for undirected graphs, the degree-corrected version of it, and the directed $p_1$ model without group structure. We frame the problem of testing model fit for the log-linear ERGM class through an exact conditional test whose $p$-value can be approximated efficiently in networks of both small and moderately large sizes. The sampling methods we build rely on a dynamic adaptation of Markov bases. We use quick estimation algorithms adapted from the contingency table literature and effective sampling methods rooted in graph theory and algebraic statistics. The performance and scalability of the method is demonstrated on two data sets from biology: the connectome of \emph{C. elegans} and the interactome of \emph{Arabidopsis thaliana}. These two networks -- a neuronal network and a protein-protein interaction network -- have been popular examples in the network science literature. Our work provides a model-based approach to studying them.
翻译:网络文献中的许多流行模型可以通过一个共同的网络dyads应急表透镜来查看,结果产生了\ emph{ log- linear ERGMs}: 随机图的指数式家庭模型, 其统计数据充分线性在 dyads 上。 我们在这个家庭里提出了一个新的模型, 即 emph{$_ 1$- SBM}, 它将网络形成机制中常见的节点和群体效应结合起来。 特别是, 这是对一些众所周知的ERGM的概括化, 包括一些非定向图、 度校正版和 $p_ 1$ 的系统模型, 以及无群结构的定向的 $p_ 1 模型。 我们通过一个精确的有条件的测试, 将 $- p$- $- 1 $- $- SBMR} 的值相近于小和中等大大小网络的网络。 我们建立的取样方法依赖于以动态模型为基础的Markov 基础。 我们使用根据应急表文献和基于图表理论和代数的有效的取样方法快速估算算算法, 。 网络的绩效和两个生物互动的网络 方法是: 。 的功能和生物互动。 。 的功能的功能的功能和可比较和可比较。