We present a randomized algorithm that computes a constant approximation of a graph's arboricity, using $\tilde{O}(n/λ)$ queries to adjacency lists and in the same time bound. Here, $n$ and $λ$ denote the number of nodes and the graph's arboricity, respectively. The $\tilde{O}(n/λ)$ query complexity of our algorithm is nearly optimal. Our constant approximation settles a question of Eden, Mossel, and Ron [SODA'22], who achieved an $O(\log^2 n)$ approximation with the same query and time complexity and asked whether a better approximation can be achieved using near-optimal query complexity. A key technical challenge in the problem is due to recursive algorithms based on probabilistic samplings, each with a non-negligible error probability. In our case, many of the recursions invoked could have bad probabilistic samples and result in high query complexities. The particular difficulty is that those bad recursions are not easy or cheap to detect and discard. Our approach runs multiple recursions in parallel, to attenuate the error probability, using a careful \textit{scheduling mechanism} that manages the speed at which each of them progresses and makes our overall query complexity competitive with the single good recursion. We find this usage of parallelism and scheduling in a sublinear algorithm remarkable, and we are hopeful that similar ideas may find applications in a wider range of sublinear algorithms that rely on probabilistic recursions.
翻译:我们提出一种随机算法,可在$\tilde{O}(n/λ)$的邻接表查询次数及相同时间界内,计算图树覆盖度的常数逼近。其中$n$和$λ$分别表示节点数和图的树覆盖度。该算法的$\tilde{O}(n/λ)$查询复杂度近乎最优。我们的常数逼近解决了Eden、Mossel和Ron[SODA'22]提出的问题——他们曾以相同查询和时间复杂度实现$O(\log^2 n)$逼近,并质疑是否能在保持近最优查询复杂度的前提下获得更好的逼近精度。该问题的关键技术挑战源于基于概率采样的递归算法,每个递归步骤都存在不可忽略的错误概率。在我们的场景中,许多被调用的递归可能因概率采样不佳而导致高查询复杂度。特殊困难在于这些不良递归既不易检测也难以低成本剔除。我们的方法通过并行运行多个递归来降低错误概率,采用精妙的\textit{调度机制}控制每个递归的推进速度,从而使总体查询复杂度与单一优质递归相当。我们认为这种在亚线性算法中运用并行与调度技术的方法具有显著价值,并期待类似思想能在更广泛依赖概率递归的亚线性算法中得到应用。