Set reconciliation protocols typically make two critical assumptions: they are designed for fixed-sized elements and they are optimized for when the difference cardinality, d, is very small. When adapting to variable-sized elements, the current practice is to synchronize fixed-size element digests. However, when the number of differences is considerable, such as after a network partition, this approach can be inefficient. Our solution is a two-stage hybrid protocol that introduces a preliminary Bloom filter step, specifically designed for this regime. The novelty of this approach, however, is in solving a core technical challenge: determining the optimal Bloom filter size without knowing d. Our solution is the Rateless Bloom Filter (RBF), a dynamic filter that naturally adapts to arbitrary symmetric differences, closely matching the communication complexity of an optimally configured static filter without requiring any prior parametrization. Our evaluation in sets of variable-sized elements shows that for Jaccard indices below 85%, our RBF-IBLT hybrid protocol reduces the total communication cost by up to over 20% compared to the state-of-the-art.
翻译:集合协调协议通常基于两个关键假设设计:针对固定大小的元素,并在差异基数d极小时进行优化。当应用于可变大小元素时,当前实践通过同步固定大小的元素摘要实现。然而,当差异数量较大(例如网络分区后),该方法效率低下。我们提出一种两阶段混合协议,引入专为此场景设计的初步布隆过滤器步骤。该方法的创新性在于解决了一个核心技术挑战:在未知d的情况下确定最优布隆过滤器大小。我们的解决方案是无速率布隆过滤器(RBF),这是一种动态过滤器,能自然适应任意对称差异,其通信复杂度与最优配置的静态过滤器高度匹配,且无需任何先验参数设定。在可变大小元素集合中的评估表明,当Jaccard指数低于85%时,我们的RBF-IBLT混合协议相比现有最优技术可降低总通信成本达20%以上。