配有四向量的快速波列树 (Faster Wavelet Trees with Quad Vectors)

Given a text, rank and select queries return the number of occurrences of a character up to a position (rank) or the position of a character with a given rank (select). These queries have applications in, e.g., compression, computational geometry, and pattern matching in the form of the backwards search -- the backbone of many compressed full-text indices. A wavelet tree is a compact data structure that for a text of length $n$ over an alphabet of size $\sigma$ requires only $n\lceil\log\sigma\rceil(1+o(1))$ bits of space and can answer rank and select queries in $\Theta(\log \sigma)$ time. Wavelet trees are used in the applications described above. In this paper, we show how to improve query performance of wavelet trees by using a 4-ary tree instead of a binary tree as basis of the wavelet tree. To this end, we present a space-efficient rank and select data structure for quad vectors. The 4-ary tree layout of a wavelet tree helps to halve the number of cache misses during queries and thus reduces the query latency. Our experimental evaluation shows that our 4-ary wavelet tree can improve the latency of rank and select queries by a factor of $\approx 2$ compared to the wavelet tree implementations contained in the widely used Succinct Data Structure Library (SDSL).

翻译：根据文本、级别和选择查询, 返回字符到位置( 级别) 或字符位置( 级别) 的发生次数。这些查询具有应用, 例如, 压缩、计算几何和模式匹配, 其形式为后向搜索( 许多压缩全文索引的主干) 。波盘树是一个紧凑的数据结构, 其长度为 $\ sgma$ 的字母, 它只需要 $\ lcil\ log\ sigma\ rceil(1+1, 1+1) 美元的空间位数, 并且可以在 $\ Theta ( log\\ sigma) 的时间中回答和选择查询。上面描述的应用程序使用了波盘树。在本文中, 我们展示了如何通过使用4 棵树而不是双树作为波树的基础来改进波谱树的性能。为此, 我们展示了一个空间高效的等级, 并选择了四位矢矢量的数据结构布局, 有助于在实验性查询期间将缓存数减半, 并以此减少我们所使用的树质 4 树底查询。