Imagine handling collisions in a hash table by storing, in each cell, the bit-wise exclusive-or of the set of keys hashing there. This appears to be a terrible idea: For $\alpha n$ keys and $n$ buckets, where $\alpha$ is constant, we expect that a constant fraction of the keys will be unrecoverable due to collisions. We show that if this collision resolution strategy is repeated three times independently the situation reverses: If $\alpha$ is below a threshold of $\approx 0.81$ then we can recover the set of all inserted keys in linear time with high probability. Even though the description of our data structure is simple, its analysis is nontrivial. Our approach can be seen as a variant of the Invertible Bloom Filter (IBF) of Eppstein and Goodrich. While IBFs involve an explicit checksum per bucket to decide whether the bucket stores a single key, we exploit the idea of quotienting, namely that some bits of the key are implicit in the location where it is stored. We let those serve as an implicit checksum. These bits are not quite enough to ensure that no errors occur and the main technical challenge is to show that decoding can recover from these errors.
翻译:想象在散列表格中处理碰撞时, 在每个单元格中存储比方的独家或一组关键散列, 以存储点 0. 81 美元 。 这似乎是一个可怕的想法 : 对于 $\ alpha n$ n$ key 和 $\ alpha$ 恒定的桶, 我们预计由于碰撞, 恒定的钥匙部分会无法被回收。 我们显示, 如果这种碰撞解决策略连续三次重复, 情况会反转: 如果 $\ alpha$ 低于 $\ approx 0. 811 的阈值, 那么我们就可以在线性时间以很高的概率回收所有插入的钥匙集。 尽管对数据结构的描述很简单, 但它的分析是非边际的 。 我们的方法可以被视为 Eppstein 和 Goodrich 的不可忽略的布局过滤器( IBFBF) 的变体。 虽然 IBFIBS 包含一个明确的每桶的校验单方来决定桶储是否为单一的钥匙, 我们利用自省略概念的想法,,, 即该键的某些部分在存储地点是隐隐隐隐隐隐的钥匙, 。 我们让这些钥匙在隐藏的错误成为了一种隐式的主要校验。