Many applications of approximate membership query data structures, or filters, require only an incremental filter that supports insertions but not deletions. However, the design space of incremental filters is missing a "sweet spot" filter that combines space efficiency, fast queries, and fast insertions. Incremental filters, such as the Bloom and blocked Bloom filter, are not space efficient. Dynamic filters (i.e., supporting deletions), such as the cuckoo or vector quotient filter, are space efficient but do not exhibit consistently fast insertions and queries. In this paper, we propose the prefix filter, an incremental filter that addresses the above challenge: (1) its space (in bits) is similar to state-of-the-art dynamic filters; (2) query throughput is high and is comparable to that of the cuckoo filter; and (3) insert throughput is high with overall build times faster than those of the vector quotient filter and cuckoo filter by $1.39\times$-$1.46\times$ and $3.2\times$-$3.5\times$, respectively. We present a rigorous analysis of the prefix filter that holds also for practical set sizes (i.e., $n=2^{25}$). The analysis deals with the probability of failure, false positive rate, and probability that an operation requires accessing more than a single cache line.
翻译:近似会籍查询数据结构或过滤器的许多应用只需要一个支持插入而不是删除的递增过滤器。 但是, 递增过滤器的设计空间缺少一个“ 甜点” 过滤器, 将空间效率、 快速查询和快速插入结合起来。 递增过滤器, 如Bloom 和屏蔽 Bloom 过滤器等递增过滤器, 没有空间效率。 动态过滤器( 支持删除), 如 cuckoo 或矢量过滤器等, 空间效率很高, 但没有一贯快速插入和查询。 在本文中, 我们提议使用前缀过滤器, 是一个应对上述挑战的递增过滤器:(1) 其空间( 位数) 类似于最先进的动态过滤器; (2) 查询管道高, 与 cuckoo 过滤器相似; (3) 插入通量比矢量过滤器和库过滤器总体建设速度快1.39\ time, $1. 46\time, 和 3.2\time $ 递增 33.5\ time 美元。 我们使用一个精确度的精确度分析, 和精确度的精确度, 。