Filters (such as Bloom Filters) are data structures that speed up network routing and measurement operations by storing a compressed representation of a set. Filters are space efficient, but can make bounded one-sided errors: with tunable probability epsilon, they may report that a query element is stored in the filter when it is not. This is called a false positive. Recent research has focused on designing methods for dynamically adapting filters to false positives, reducing the number of false positives when some elements are queried repeatedly. Ideally, an adaptive filter would incur a false positive with bounded probability epsilon for each new query element, and would incur o(epsilon) total false positives over all repeated queries to that element. We call such a filter support optimal. In this paper we design a new Adaptive Cuckoo Filter and show that it is support optimal (up to additive logarithmic terms) over any n queries when storing a set of size n. Our filter is simple: fixing previous false positives requires a simple cuckoo operation, and the filter does not need to store any additional metadata. This data structure is the first practical data structure that is support optimal, and the first filter that does not require additional space to fix false positives. We complement these bounds with experiments showing that our data structure is effective at fixing false positives on network traces, outperforming previous Adaptive Cuckoo Filters. Finally, we investigate adversarial adaptivity, a stronger notion of adaptivity in which an adaptive adversary repeatedly queries the filter, using the result of previous queries to drive the false positive rate as high as possible. We prove a lower bound showing that a broad family of filters, including all known Adaptive Cuckoo Filters, can be forced by such an adversary to incur a large number of false positives.
翻译:过滤器( 如 Bloom 过滤器) 是数据结构, 加速网络路由和测量操作, 存储一个集的压缩代表。 过滤器是空间效率高的, 但它可以做出有界限的单向错误: 使用金枪鱼概率 epsilon, 它们可以报告一个查询元素在过滤器中存储, 如果它不是。 这被称为假阳性 。 最近的研究侧重于设计方法, 动态地将过滤器修改为假正数, 反复询问某些元素时减少假正数。 理想的情况是, 适应性过滤器将产生一个错误的正正数, 并且每个新查询元素都具有约束性的百日百日百日百日百日百日百日百日百日百日。 这个数据结构在使用所有重复的查询时, 我们用一个不折叠式的直线性数据结构中, 最终需要用正数的正数结构来修正先前的正数 。 我们的过滤器可以简单: 修正以前的正数, 只需要一个简单的顺数 来补充一个简单的顺差的操作, 并且 过滤器不需要再存储任何额外的元数据 。 这个数据结构会最终地 。