Diffusion source identification on networks is a problem of fundamental importance in a broad class of applications, including rumor controlling and virus identification. Though this problem has received significant recent attention, most studies have focused only on very restrictive settings and lack theoretical guarantees for more realistic networks. We introduce a statistical framework for the study of diffusion source identification and develop a confidence set inference approach inspired by hypothesis testing. Our method efficiently produces a small subset of nodes, which provably covers the source node with any pre-specified confidence level without restrictive assumptions on network structures. Moreover, we propose multiple Monte Carlo strategies for the inference procedure based on network topology and the probabilistic properties that significantly improve the scalability. To our knowledge, this is the first diffusion source identification method with a practically useful theoretical guarantee on general networks. We demonstrate our approach via extensive synthetic experiments on well-known random network models and a mobility network between cities concerning the COVID-19 spreading.
翻译:网络上的传播源识别是一个在广泛的应用类别中具有根本重要性的问题,包括谣言控制和病毒识别。尽管这个问题最近受到大量关注,但大多数研究只关注限制性很强的环境,缺乏理论保障,更现实的网络。我们引入了研究传播源识别的统计框架,并开发了一种由假设测试启发的一套信任的推论方法。我们的方法高效生成了一小部分节点,在网络结构没有限制性假设的情况下,以任何预先确定的信任水平覆盖源节点。此外,我们提出了基于网络地形学的多重蒙特卡洛战略,以推断程序为基础,并提出了大幅改善可扩展性的概率特性。据我们所知,这是第一个传播源识别方法,在一般网络上提供了实用的理论保障。我们通过对众所周知的随机网络模型和城市之间关于COVID-19扩散的流动网络进行广泛的合成实验,展示了我们的方法。