随机子图检测：基于查询方法 (Random Subgraph Detection Using Queries)

The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on $n$ nodes. Under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi graph with edge probability (or, density) $q$. Under the alternative, there is a subgraph on $k$ vertices with edge probability $p>q$. The statistical as well as the computational barriers of this problem are well-understood for a wide range of the edge parameters $p$ and $q$. In this paper, we consider a natural variant of the above problem, where one can only observe a small part of the graph using adaptive edge queries. For this model, we determine the number of queries necessary and sufficient for detecting the presence of the planted subgraph. Specifically, we show that any (possibly randomized) algorithm must make $\mathsf{Q} = \Omega(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$ adaptive queries (on expectation) to the adjacency matrix of the graph to detect the planted subgraph with probability more than $1/2$, where $\chi^2(p||q)$ is the Chi-Square distance. On the other hand, we devise a quasi-polynomial-time algorithm that detects the planted subgraph with high probability by making $\mathsf{Q} = O(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$ non-adaptive queries. We then propose a polynomial-time algorithm which is able to detect the planted subgraph using $\mathsf{Q} = O(\frac{n^3}{k^3\chi^2(p||q)}\log^3 n)$ queries. We conjecture that in the leftover regime, where $\frac{n^2}{k^2}\ll\mathsf{Q}\ll \frac{n^3}{k^3}$, no polynomial-time algorithms exist. Our results resolve two questions posed in \cite{racz2020finding}, where the special case of adaptive detection and recovery of a planted clique was considered.

翻译：本文研究种植密度最大子图检测问题，即对于给定（随机的）图，测试是否存在一个密度异常高的子图。具体来说，我们观察一个$n$个节点的无向无权图。在零假设下，该图是一个Erdős-Rényi图，其边概率（或密度）为$q$。在备择假设下，存在一个$k$个顶点的子图，其边概率为$p > q$。针对不同范围的边参数$p$和$q$，本文已对该问题的统计和计算壁垒进行了深入理解。在本文中，我们考虑了上述问题的一种自然变体，即只能使用自适应边查询来观测图的一个小部分。对于这种模型，我们确定了必要查询次数和检测到种植子图的充分条件。具体而言，我们展示了做出$\mathsf{Q}=\Omega(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$个查询（期望数量）来检测在随机图中种植的子图的存在性的算法（可以是随机化的），其中$\chi^2(p||q)$是卡方距离。此外，借助$\mathsf{Q} = O(\frac{n^2}{k^2\chi^4(p||q)}\log^2n)$个非自适应查询，我们设计了一种准多项式时间算法来高概率地检测种植子图。接着，我们提出了一种多项式时间算法，能够利用$\mathsf{Q} = O(\frac{n^3}{k^3\chi^2(p||q)}\log^3 n)$的查询次数来检测种植子图。我们猜测，在$\frac{n^2}{k^2}\ll\mathsf{Q}\ll \frac{n^3}{k^3}$的整体范围内，不存在任何多项式时间算法。我们的研究在检测和恢复被种植的团体的自适应问题中提出的两个问题得到了解决。