Popular approximate membership query structures such as Bloom filters and cuckoo filters are widely used in databases, security, and networking. These structures support two operations -- insert and lookup; lookup always returns true on elements inserted into the structure, while it returns true with some probability $\varepsilon \ll 1$ on elements not inserted into the structure. These latter elements are called false positives. Compensatory for these false positives, filters can be much smaller than hash tables that represent the same set. However, unlike hash tables, cuckoo filters and Bloom filters must be initialized with the intended number of inserts to be performed, and cannot grow larger -- inserts beyond this number may fail or increase the false positive probability. This paper presents designs and implementations of filters than can grow without inserts failing and without increasing the false positive probability, even if the filters are created with a small initial size. The resulting code is open source and available for general use.
翻译:Bloom 过滤器和 cuckoo 过滤器等广受欢迎的成员查询结构在数据库、安全和网络中广泛使用。这些结构支持两种操作 -- -- 插入和查看;查找总是对插入结构的元素真实返回,而对于未插入结构的元素则返回真实,但可能性为$\varepsilon\ll 1美元。这些元素被称为假正数。对这些假正数的补偿,过滤器可能比代表同一组的散列表要小得多。然而,与散列表格不同,必须先用预想的插入数来初始化 cuckoo 过滤器和 Bloom 过滤器,且不能扩大 -- -- 超过此数的插入可能失败或增加错误正概率。本文显示过滤器的设计和实施比在不插入失败和不增加假正概率的情况下增长的长得更多,即使过滤器创建的初始大小较小。因此生成的代码是开放源,可供一般使用。