Until recently, most experts would probably have agreed we cannot backwards-step in constant time with a run-length compressed Burrows-Wheeler Transform (RLBWT), since doing so relies on rank queries on sparse bitvectors and those inherit lower bounds from predecessor queries. At ICALP '21, however, Nishimoto and Tabei described a new, simple and constant-time implementation. For a permutation $\pi$, it stores an $O (r)$-space table -- where $r$ is the number of positions $i$ where either $i = 0$ or $\pi (i + 1) \neq \pi (i) + 1$ -- that enables the computation of successive values of $\pi(i)$ by table look-ups and linear scans. Nishimoto and Tabei showed how to increase the number of rows in the table to bound the length of the linear scans such that the query time for computing $\pi(i)$ is constant while maintaining $O (r)$-space. In this paper we refine Nishimoto and Tabei's approach, including a time-space tradeoff, and experimentally evaluate different implementations demonstrating the practicality of part of their result. We show that even without adding rows to the table, in practice we almost always scan only a few entries during queries. We propose a decomposition scheme of the permutation $\pi$ corresponding to the LF-mapping that allows an improved compression of the data structure, while limiting the query time. We tested our implementation on real-world genomic datasets and found that without compression of the table, backward-stepping is drastically faster than with sparse bitvector implementations but, unfortunately, also uses drastically more space. After compression, backward-stepping is competitive both in time and space with the best existing implementations.
翻译:直到最近,大多数专家可能都同意我们无法在固定的时间里与一个连续的压缩 Burrows-Wheeler 变换(RLBWT) 相向不断退步,因为这样做依赖于对稀疏的比特方的排名查询,而那些从先前的查询中继承较低界限的人。然而,在 CICOMP 21 中,西本和太北描述了一个新的、简单和固定的时间执行。对于一个调整 $( r) 美元的空间表, 它存储了一个O (r) 美元( r)- 空间表 -- 美元是美元= 0 或 $\ pi( + 1) 或 $( + 1)\ neq\ pi (i) + 1美元), 因为它依赖于对稀疏松动的比分级查询, 从而能够通过表上检查和线性扫描来计算 $( $( r) i) 。 在本文中, 我们不断改进 的向下方的向下方的向下方的计算, 和向下方的向下方的计算, 我们的向下方的运行中显示一个执行过程的数据是一次。