Indexing highly repetitive strings (i.e., strings with many repetitions) for fast queries has become a central research topic in string processing, because it has a wide variety of applications in bioinformatics and natural language processing. Although a substantial number of indexes for highly repetitive strings have been proposed thus far, developing compressed indexes that support various queries remains a challenge. The run-length Burrows-Wheeler transform (RLBWT) is a lossless data compression by a reversible permutation of an input string and run-length encoding, and it has received interest for indexing highly repetitive strings. LF and $\phi^{-1}$ are two key functions for building indexes on RLBWT, and the best previous result computes LF and $\phi^{-1}$ in $O(\log \log n)$ time with $O(r)$ words of space for the string length $n$ and the number $r$ of runs in RLBWT. In this paper, we improve LF and $\phi^{-1}$ so that they can be computed in a constant time with $O(r)$ words of space. Subsequently, we present OptBWTR (optimal-time queries on BWT-runs compressed indexes), the first string index that supports various queries including locate, count, extract queries in optimal time and $O(r)$ words of space.
翻译:用于快速查询的高度重复性字符串索引(即,与许多重复的字符串连接)已成为字符串处理的一个核心研究课题,因为它在生物信息学和自然语言处理方面有着各种各样的应用。虽然迄今为止已经提出了大量关于高度重复性字符串的索引,但开发支持各种查询的压缩索引仍是一个挑战。运行长的Burrows-Wheeler变换(RLBWT)是一个无损的数据压缩,通过对输入字符串和运行长度编码进行可逆的修改压缩,并获得了对高度重复性字符串索引化的兴趣。LF和$\phi ⁇ -1}$是建立RLBWT索引的两个关键功能,而以往的最佳结果用$O(log\log n)计算了LF和$\phi ⁇ -1},用$(r)字来压缩字符串长度为美元和运行在RLBWWT中,我们改进了LF 和$\\\\ $($) 美元,因此它们可以以恒定时间计算,包括 $WW Streal 的SLF-rial Ex Ex Ex Ex Ex Ex Ex Ex 。