Projections of bipartite or two-mode networks capture co-occurrences, and are used in diverse fields (e.g., ecology, economics, bibliometrics, politics) to represent unipartite networks. A key challenge in analyzing such networks is determining whether an observed number of co-occurrences between two nodes is significant, and therefore whether an edge exists between them. One approach, the fixed degree sequence model (FDSM), evaluates the significance of an edge's weight by comparison to a null model in which the degree sequences of the original bipartite network are fixed. Although the FDSM is an intuitive null model, it is computationally expensive because it requires Monte Carlo simulation to estimate each edge's $p$-value, and therefore is impractical for large projections. In this paper, we explore four potential alternatives to FDSM: fixed fill model (FFM), fixed row model (FRM), fixed column model (FCM), and stochastic degree sequence model (SDSM). We compare these models to FDSM in terms of accuracy, speed, statistical power, similarity, and ability to recover known communities. We find that the computationally-fast SDSM offers a statistically conservative but close approximation of the computationally-impractical FDSM under a wide range of conditions, and that it correctly recovers a known community structure even when the signal is weak. Therefore, although each backbone model may have particular applications, we recommend SDSM for extracting the backbone of bipartite projections when FDSM is impractical.
翻译:两边网络或双模式网络的预测显示共发事件,并用于不同领域(例如生态、经济学、双光度、政治),代表单方网络。分析这种网络的一个关键挑战是确定两个节点之间观察到的共发事件数量是否重要,因此它们之间是否存在边缘。一种方法是固定度序列模型(FDSM),与原始两方网络的等级序列固定不变的无效模型(例如生态、经济学、双光度、政治)相比,评估边缘重量的重要性。虽然FDSM是一个直观的无效模型,但它在计算成本上非常昂贵,因为它需要蒙特卡洛模拟来估计每个边缘的美元价值,因此对大型预测来说不切实际。在本文中,我们探讨FDSM的四种潜在替代方法:固定填充模型(FFM)、固定行模型(FRM)、固定列模型(FCM)和固定度硬度模型(SDSSM)的准确度序列(SDSM),我们将这些模型与FDSM的准确性、速度、统计力、相似,我们所了解的精确度模型(SMSM)的精确度结构,我们所了解的精确度结构可以恢复。