In this work, we study the limits of compressed data structures, i.e., structures that support various queries on an input text $T\in\Sigma^n$ using space proportional to the size of $T$ in compressed form. Nearly all fundamental queries can currently be efficiently supported in $O(\delta(T)\log^{O(1)}n)$ space, where $\delta(T)$ is the substring complexity, a strong compressibility measure that lower-bounds the optimal space to represent the text [Kociumaka, Navarro, Prezza, IEEE Trans. Inf. Theory 2023]. However, optimal query time has been characterized only for random access. We address this gap by developing tight lower bounds for nearly all other fundamental queries: (1) We prove that suffix array (SA), inverse suffix array (SA$^{-1}$), longest common prefix (LCP) array, and longest common extension (LCE) queries all require $\Omega(\log n/\log\log n)$ time within $O(\delta(T)\log^{O(1)}n)$ space, matching known upper bounds. (2) We further show that other common queries, currently supported in $O(\log\log n)$ time and $O(\delta(T)\log^{O(1)}n)$ space, including the Burrows-Wheeler Transform (BWT), permuted longest common prefix (PLCP) array, Last-to-First (LF), inverse LF, lexicographic predecessor ($\Phi$), and inverse $\Phi$ queries, all require $\Omega(\log\log n)$ time, yielding another set of tight bounds. Our lower bounds hold even for texts over a binary alphabet. This work establishes a clean dichotomy: the optimal time complexity to support central string queries in compressed space is either $\Theta(\log n/\log\log n)$ or $\Theta(\log\log n)$. This completes the theoretical foundation of compressed indexing, closing a crucial gap between upper and lower bounds and providing a clear target for future data structures: seeking either the optimal time in the smallest space or the fastest time in the optimal space, both of which are now known for central string queries.
翻译:暂无翻译