With network data becoming ubiquitous in many applications, many models and algorithms for network analysis have been proposed. Yet methods for providing uncertainty estimates in addition to point estimates of network parameters are much less common. While bootstrap and other resampling procedures have been an effective general tool for estimating uncertainty from i.i.d. samples, adapting them to networks is highly nontrivial. In this work, we study three different network resampling procedures for uncertainty estimation, and propose a general algorithm to construct confidence intervals for network parameters through network resampling. We also propose an algorithm for selecting the sampling fraction, which has a substantial effect on performance. We find that, unsurprisingly, no one procedure is empirically best for all tasks, but that selecting an appropriate sampling fraction substantially improves performance in many cases. We illustrate this on simulated networks and on Facebook data.
翻译:随着网络数据在许多应用中变得无处不在,提出了许多网络分析模式和算法;然而,除了对网络参数的点估计外,提供不确定性估计的方法也很少见;虽然靴子陷阱和其他再抽样程序一直是从i.d.样本中估计不确定性的有效一般工具,使它们适应网络是高度非技术性的。在这项工作中,我们研究了三种不同的网络重新抽样程序,以估算不确定性,并提出一种通过网络再抽样为网络参数建立信任间隔的一般算法。我们还提出了一个选择抽样分数的算法,这对性能有重大影响。我们发现,不奇怪的是,没有一种程序是对所有任务都最有经验的,但选择适当的抽样分数可以大大改善许多情况下的绩效。我们在模拟网络和脸谱上的数据中对此加以说明。