Two-sample hypothesis testing for comparing two networks is an important yet difficult problem. Major challenges include: potentially different sizes and sparsity levels; non-repeated observations of adjacency matrices; computational scalability; and theoretical investigations, especially on finite-sample accuracy and minimax optimality. In this article, we propose the first provably higher-order accurate two-sample inference method by comparing network moments. Our method extends the classical two-sample t-test to the network setting. We make weak modeling assumptions and can effectively handle networks of different sizes and sparsity levels. We establish strong finite-sample theoretical guarantees, including rate-optimality properties. Our method is easy to implement and computes fast. We also devise a novel nonparametric framework of offline hashing and fast querying particularly effective for maintaining and querying very large network databases. We demonstrate the effectiveness of our method by comprehensive simulations. We apply our method to two real-world data sets and discover interesting novel structures.
翻译:对两个网络进行比较的两样假设测试是一个重要而困难的问题。主要的挑战包括:潜在的不同大小和宽度水平;不重复对相邻矩阵的观测;可计算性;理论调查,特别是有限抽样精确度和微度最佳度的理论调查。在本篇文章中,我们建议了第一个可以比较网络时点的可察觉较高顺序准确的两样抽样推断法。我们的方法将传统的两样抽样测试扩展至网络设置。我们做了薄弱的模型假设,能够有效地处理不同大小和宽度水平的网络。我们建立了强大的有限抽样理论保证,包括比率最佳性能。我们的方法很容易执行和快速编译。我们还设计了一个新的非参数框架,对维护和查询非常庞大的网络数据库特别有效。我们通过全面模拟来展示我们的方法的有效性。我们把我们的方法应用于两个真实世界的数据集,并发现有趣的新结构。