One of the fundamental task in graph data mining is to find a planted community(dense subgraph), which has wide application in biology, finance, spam detection and so on. For a real network data, the existence of a dense subgraph is generally unknown. Statistical tests have been devised to testing the existence of dense subgraph in a homogeneous random graph. However, many networks present extreme heterogeneity, that is, the degrees of nodes or vertexes don't concentrate on a typical value. The existing tests designed for homogeneous random graph are not straightforwardly applicable to the heterogeneous case. Recently, scan test was proposed for detecting a dense subgraph in heterogeneous(inhomogeneous) graph(\cite{BCHV19}). However, the computational complexity of the scan test is generally not polynomial in the graph size, which makes the test impractical for large or moderate networks. In this paper, we propose a polynomial-time test that has the standard normal distribution as the null limiting distribution. The power of the test is theoretically investigated and we evaluate the performance of the test by simulation and real data example.
翻译:图形数据挖掘的基本任务之一是找到在生物学、金融、垃圾检测等方面广泛应用的植入社区(高级子系统),这种社区在生物学、金融、垃圾检测等方面有着广泛的应用。对于真正的网络数据来说,一般不知道是否存在密集的子系统。已经设计了统计测试,以测试在同质随机图中是否存在密度的子系统。然而,许多网络呈现出极端异质性,即结点或脊椎的程度并不集中于一个典型的值。为同质随机图设计的现有测试并不直接适用于多元性案例。最近,提议进行扫描测试,以探测混杂(不相形)图(\cite{BCHV19})中的密集子系统。然而,扫描测试的计算复杂性一般不是图形大小的多元性,这使得测试对大型或中度网络来说不切实际。在本文中,我们建议采用以标准正常分布为无效的多时试验。测试的力量是理论上的调查,我们通过模拟和真实数据来评估测试的性。