We show how to merge run-length compressed Burrows-Wheeler Transforms (RLBWTs) quickly and in $O (R)$ space, where $R$ is the total number of runs in them, when a certain parameter is small. Specifically, we consider the boundaries in their combined extended Burrows-Wheeler Transform (eBWT) between blocks of characters from the same original RLBWT, and denote by $L$ the sum of the longest common prefix (LCP) values at those boundaries. We show how to merge the RLBWTs in $\tilde{O} (L + σ+ R)$ time, where $σ$ is the alphabet size. We conjecture that $L$ tends to be small when the strings (or sets of strings) underlying the original RLBWTs are repetitive but dissimilar.
翻译:我们提出了一种在特定参数较小时,以$O(R)$空间复杂度快速合并游程编码压缩的Burrows-Wheeler变换(RLBWT)的方法,其中$R$为总游程数。具体而言,我们分析合并后扩展Burrows-Wheeler变换(eBWT)中来自同一原始RLBWT字符块之间的边界,并以$L$表示这些边界处最长公共前缀(LCP)值的总和。我们证明了在$\tilde{O}(L + σ + R)$时间内完成RLBWT合并的可行性,其中$σ$为字母表大小。我们推测当原始RLBWT所基于的字符串(或字符串集合)具有高重复性但低相似性时,$L$值往往较小。