We consider the problem of counting the copies of a length-$k$ pattern $\sigma$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $\sigma(j) < \sigma(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when $k$ is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] obtained an $O(n)$ time algorithm for the detection variant for any fixed $k$. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of $n^{\Omega(k / \log k)}$ [Berendsohn, Kozma, and Marx, 2019] and is expected to be polynomially harder than detection as early as $k = 4$, given its equivalence to counting $4$-cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time $(1+\varepsilon)$-approximation algorithm for counting $\sigma$-copies in $f$ for all $k \leq 5$. Combined with the conditional lower bound for $k=4$, this establishes the first known separation between approximate and exact pattern counting. Interestingly, while neither the sequence $f$ nor the pattern $\sigma$ are monotone, our algorithm makes extensive use of coresets for monotone functions [Har-Peled, 2006]. Along the way, we develop a near-optimal data structure for $(1+\varepsilon)$-approximate increasing pair range queries in the plane, which exhibits a conditional separation from the exact case and may be of independent interest.
翻译:我们考虑在序列$f \colon [n] \to \mathbb{R}$中计数长度为$k$的模式$\sigma$的副本问题,其中副本是指满足以下条件的指标子集$i_1 < \ldots < i_k \in [n]$:当且仅当$\sigma(j) < \sigma(\ell)$时,有$f(i_j) < f(i_\ell)$。该问题受到排序、非参数统计、组合数学及精细复杂度等领域的一系列关联与应用推动,尤其当$k$为较小的固定常数时。近年来的研究显著提升了我们对模式计数与检测问题的理解。Guillemot与Marx [2014]针对任意固定$k$的检测变体提出了$O(n)$时间算法。其证明为后续发现孪生宽度概念奠定了基础,该概念近年来显著推动了参数化复杂度领域的发展。相比之下,计数问题更为困难:存在$n^{\Omega(k / \log k)}$的条件性下界[Berendsohn, Kozma, and Marx, 2019],且由于其在$k=4$时等价于图中$4$-环计数问题[Dudek and Gawrychowski, 2020],预期从$k=4$开始即比检测问题具有多项式级难度。本研究中,我们针对所有$k \leq 5$的情况,设计了一种确定性近线性时间$(1+\varepsilon)$-近似算法,用于计数$f$中的$\sigma$副本。结合$k=4$时的条件性下界,这首次建立了近似与精确模式计数之间的分离关系。值得注意的是,尽管序列$f$与模式$\sigma$均非单调,我们的算法却广泛利用了单调函数的核心集技术[Har-Peled, 2006]。在此过程中,我们开发了一种用于平面$(1+\varepsilon)$-近似递增对范围查询的近最优数据结构,该结构展现出与精确情况的条件性分离特性,可能具有独立研究价值。