A retrieval data structure for a static function $f:S\rightarrow \{0,1\}^r$ supports queries that return $f(x)$ for any $x \in S$. Retrieval data structures can be used to implement a static approximate membership query data structure (AMQ), i.e., a Bloom filter alternative, with false positive rate $2^{-r}$. The information-theoretic lower bound for both tasks is $r|S|$ bits. While succinct theoretical constructions using $(1+o(1))r|S|$ bits were known, these could not achieve very small overheads in practice because they have an unfavorable space--time tradeoff hidden in the asymptotic costs or because small overheads would only be reached for physically impossible input sizes. With bumped ribbon retrieval (BuRR), we present the first practical succinct retrieval data structure. In an extensive experimental evaluation BuRR achieves space overheads well below 1\,\% while being faster than most previously used retrieval data structures (typically with space overheads at least an order of magnitude larger) and faster than classical Bloom filters (with space overhead $\geq 44\,\%$). This efficiency, including favorable constants, stems from a combination of simplicity, word parallelism, and high locality. We additionally describe homogeneous ribbon filter AMQs, which are even simpler and faster at the price of slightly larger space overhead.
翻译:用于静态函数的检索数据结构 $f : S\\ rightrow = 0. 1 ⁇ r $ 支持返回 $x $x $x 美元 的查询 。 检索数据结构可用于实施静态近似成员查询数据结构(AMQ), 即Bloom过滤器选项, 假正率为 2 ⁇ - r 美元 。 两种任务的信息- 理论下限是 $@S ⁇ 位 。 虽然已知使用 $( 1+o(1)) r ⁇ S $ 位位的简单理论结构, 但这些结构在实际操作中不可能达到非常小的间接费用, 因为这些结构有隐藏在静态成本中不受欢迎的时空时间交易, 或因为小的间接费用只能用于实际无法投入的大小。 在缓冲的丝带检索(BARR) 中,我们第一次提出了实用的简明检索数据结构。 在广泛的实验性评估中, BuRR 实现的空间管理中心远低于 1\, \,, 而比大多数使用的数据检索数据结构要快( 通常使用的数据结构, 以最小略的空端端点为最小的平平的平的平的平的平级, 较快,, Q 更快速的平的平的平的平的平的平的平流, 。