Many existing statistical models for networks overlook the fact that many real world networks are formed through a growth process. To address this, we introduce the PAPER (Preferential Attachment Plus Erd\H{o}s--R\'{e}nyi) model for random networks, where we let a random network G be the union of a preferential attachment (PA) tree T and additional Erd\H{o}s--R\'{e}nyi (ER) random edges. The PA tree component captures the underlying growth/recruitment process of a network where vertices and edges are added sequentially, while the ER component can be regarded as random noise. Given only a single snapshot of the final network G, we study the problem of constructing confidence sets for the early history, in particular the root node, of the unobserved growth process; the root node can be patient zero in a disease infection network or the source of fake news in a social media network. We propose an inference algorithm based on Gibbs sampling that scales to networks with millions of nodes and provide theoretical analysis showing that the expected size of the confidence set is small so long as the noise level of the ER edges is not too large. We also propose variations of the model in which multiple growth processes occur simultaneously, reflecting the growth of multiple communities, and we use these models to provide a new approach to community detection.
翻译:许多现有的网络统计模式忽略了许多真实世界网络是通过增长过程形成的这一事实。 为了解决这个问题, 我们为随机网络引入了PAPER( Ppecial Append Pus Erd\H{o}s- R\{{e}nyi) 模式( POPER + Erd\H{H}s- R\{{e}nyi) 模式, 我们让随机网络的随机网络G随机网络让随机网络G 网络G 组合为优惠附加( PA) 树 T 和 额外的 Erd\H{o}s- R\{e}i 随机边缘。 PA 树部分捕捉到一个网络的成长/招聘过程, 网络的脊椎和边缘相继添加, 而ER 组件可以被视为随机的噪音。 我们只对最终网络 GG 进行一次简略的描述, 我们研究为早期历史, 特别是未观测过程的根节点问题; 根节点在疾病感染网络或社交媒体网络的假新闻来源中可能是零的。 我们建议一种基于 Gibbbsbbs 的推算法算法, 方法, 用数百万网络的节点, 并提供理论分析显示, 显示预期的信心变化规模的大小不会在多度变化的大小 。