A self-index is a compressed data structure that supports locate queries--reporting all positions where a given pattern occurs in a string while maintaining the string in compressed form. While many self-indexes have been proposed, developing dynamically updatable ones supporting string insertions and deletions remains a challenge. The r-index (Gagie et al., JACM'20) is a representative static self-index based on the run-length Burrows-Wheeler transform (RLBWT), designed for highly repetitive strings. We present the dynamic r-index, a dynamic extension of the r-index that achieves updates in LCP-bounded time. The dynamic r-index supports count queries in $\mathcal{O}(m \log r / \log \log r)$ time and locate queries in $\mathcal{O}(m \log r / \log \log r + \mathsf{occ} \log r)$ time, using $\mathcal{O}(r)$ words of space, where $m$ is the length of a query with $\mathsf{occ}$ occurrences and $r$ is the number of runs in the RLBWT. Crucially, update operations are supported in $\mathcal{O}((m + L_{\mathsf{max}}) \log n)$ time for a substring of length $m$, where $L_{\mathsf{max}}$ is the maximum LCP value; the average running time is $\mathcal{O}((m + L_{\mathsf{avg}}) \log n)$, where $L_{\mathsf{avg}}$ is the average LCP value. This LCP-bounded complexity is particularly advantageous for highly repetitive strings where LCP values are typically small. We experimentally demonstrate the practical efficiency of the dynamic r-index on various highly repetitive datasets.
翻译:暂无翻译