Given a text, rank and select queries return the number of occurrences of a character up to a position (rank) or the position of a character with a given rank (select). These queries have applications in, e.g., compression, computational geometry, and pattern matching in the form of the backwards search -- the backbone of many compressed full-text indices. A wavelet tree is a compact data structure that for a text of length $n$ over an alphabet of size $\sigma$ requires only $n\lceil\log\sigma\rceil(1+o(1))$ bits of space and can answer rank and select queries in $\Theta(\log \sigma)$ time. Wavelet trees are used in the applications described above. In this paper, we show how to improve query performance of wavelet trees by using a 4-ary tree instead of a binary tree as basis of the wavelet tree. To this end, we present a space-efficient rank and select data structure for quad vectors. The 4-ary tree layout of a wavelet tree helps to halve the number of cache misses during queries and thus reduces the query latency. Our experimental evaluation shows that our 4-ary wavelet tree can improve the latency of rank and select queries by a factor of $\approx 2$ compared to the wavelet tree implementations contained in the widely used Succinct Data Structure Library (SDSL).
翻译:根据文本、 级别和选择查询, 返回字符到位置( 级别) 或字符位置( 级别) 的发生次数。 这些查询具有应用, 例如, 压缩、 计算几何和模式匹配, 其形式为后向搜索( 许多压缩全文索引的主干) 。 波盘树是一个紧凑的数据结构, 其长度为 $\ sgma$ 的字母, 它只需要 $\ lcil\ log\ sigma\ rceil(1+1, 1+1) 美元的空间位数, 并且可以在 $\ Theta ( log\\ sigma) 的时间 中回答和选择查询 。 上面描述的应用程序使用了波盘树 。 在本文中, 我们展示了如何通过使用4 棵树而不是 双树作为波树的基础来改进波谱树的性能。 为此, 我们展示了一个空间高效的等级, 并选择了四位矢矢量的数据结构布局, 有助于在实验性查询期间将缓存数减半, 并以此减少我们所使用的 树质 4 树底查询 。