Popular approximate membership query structures such as Bloom filters and cuckoo filters are widely used in databases, security, and networking. These structures support two operations -- insert and lookup; lookup always returns true on elements inserted into the structure, while it returns true with some probability $\varepsilon \ll 1$ on elements not inserted into the structure. These latter elements are called false positives. Compensatory for these false positives, filters can be much smaller than hash tables that represent the same set. However, unlike hash tables, cuckoo filters and Bloom filters must be initialized with the intended number of inserts to be performed, and cannot grow larger -- inserts beyond this number may fail or increase the false positive probability. This paper presents designs and implementations of filters than can grow without inserts failing and without significantly increasing the false positive probability, even if the filters are created with a small initial size. The resulting code is available on GitHub under a permissive open source license.
翻译:Bloom 过滤器和 cuckoo 过滤器等广受欢迎的成员查询结构在数据库、安全和网络中广泛使用。这些结构支持两种操作 -- -- 插入和查看;查找总是对插入结构的元素真实返回,而对于未插入结构的元素则返回真实,但可能性为$\varepsilon\ll 1美元。这些元素被称为假正数。对这些假正数的补偿,过滤器可能比代表同一组的散列表要小得多。然而,与散列表格不同,必须初始化 cuckoo 过滤器和闪烁过滤器,并启动要完成的插入数量,且不能扩大 -- -- 超过此数的插入可能失败或增加错误正概率。本文显示过滤器的设计和实施,但不能在不插入失败的情况下增长,而且不会大大增加假正概率,即使过滤器的初始大小较小。因此在 GitHub 上可以使用允许的开源许可。