丝带过滤器:几乎小于Bloom和Xor (Ribbon filter: practically smaller than Bloom and Xor)

Filter data structures over-approximate a set of hashable keys, i.e. set membership queries may incorrectly come out positive. A filter with false positive rate $f \in (0,1]$ is known to require $\ge \log_2(1/f)$ bits per key. At least for larger $f \ge 2^{-4}$, existing practical filters require a space overhead of at least 20% with respect to this information-theoretic bound. We introduce the Ribbon filter: a new filter for static sets with a broad range of configurable space overheads and false positive rates with competitive speed over that range, especially for larger $f \ge 2^{-7}$. In many cases, Ribbon is faster than existing filters for the same space overhead, or can achieve space overhead below 10% with some additional CPU time. An experimental Ribbon design with load balancing can even achieve space overheads below 1%. A Ribbon filter resembles an Xor filter modified to maximize locality and is constructed by solving a band-like linear system over Boolean variables. In previous work, Dietzfelbinger and Walzer describe this linear system and an efficient Gaussian solver. We present and analyze a faster, more adaptable solving process we call "Rapid Incremental Boolean Banding ON the fly," which resembles hash table construction. We also present and analyze an attractive Ribbon variant based on making the linear system homogeneous, and describe several more practical enhancements.

翻译：超过一套散列密钥的过滤器数据结构, 即设定会籍询问可能错误地显示为正。已知一个错误正率的过滤器, 每按键需要$\ge\log_ 2( 1/ f) 美元。至少对于更大的 $\ ge 2 ⁇ -4} 美元, 现有的实用过滤器需要至少20%的空间管理费, 与这个信息- 理论约束有关的负负比值。我们引入了ribbon 过滤器: 用于静态装置的新过滤器, 其可配置空间管理器的范围很广, 以及具有竞争性速度的虚假正率, 特别是对于更大的 $f\ ge 2 ⁇ 7} 。在许多情况下, Ribbon 的过滤器比现有的空间管理器要快得多, 或者可以达到10%以下的空间管理器。 ribon 过滤器类似于一个基于最大程度可配置空间管理器的自动过滤器, 并且通过在Boolean 系统上找到一个类似条式的线性系统, 。在以往的工作、底色分析系统上, 我们描述一个更快速的系统, 和直线性分析系统, 正在描述一个更快速的系统, 一种高效的系统, 我们的系统, 正在描述一个更快速的平流式分析系统, 正在使用一个更快速的计算, 。