We consider the problem of counting the copies of a length-$k$ pattern $\sigma$ in a sequence $f \colon [n] \to \mathbb{R}$, where a copy is a subset of indices $i_1 < \ldots < i_k \in [n]$ such that $f(i_j) < f(i_\ell)$ if and only if $\sigma(j) < \sigma(\ell)$. This problem is motivated by a range of connections and applications in ranking, nonparametric statistics, combinatorics, and fine-grained complexity, especially when $k$ is a small fixed constant. Recent advances have significantly improved our understanding of counting and detecting patterns. Guillemot and Marx [2014] obtained an $O(n)$ time algorithm for the detection variant for any fixed $k$. Their proof has laid the foundations for the discovery of the twin-width, a concept that has notably advanced parameterized complexity in recent years. Counting, in contrast, is harder: it has a conditional lower bound of $n^{\Omega(k / \log k)}$ [Berendsohn, Kozma, and Marx, 2019] and is expected to be polynomially harder than detection as early as $k = 4$, given its equivalence to counting $4$-cycles in graphs [Dudek and Gawrychowski, 2020]. In this work, we design a deterministic near-linear time $(1+\varepsilon)$-approximation algorithm for counting $\sigma$-copies in $f$ for all $k \leq 5$. Combined with the conditional lower bound for $k=4$, this establishes the first known separation between approximate and exact pattern counting. Interestingly, while neither the sequence $f$ nor the pattern $\sigma$ are monotone, our algorithm makes extensive use of coresets for monotone functions [Har-Peled, 2006]. Along the way, we develop a near-optimal data structure for $(1+\varepsilon)$-approximate increasing pair range queries in the plane, which exhibits a conditional separation from the exact case and may be of independent interest.
翻译:暂无翻译