Count-Min sketch is a hash-based data structure to represent a dynamically changing associative array of counters. Here we analyse the counting version of Count-Min under a stronger update rule known as \textit{conservative update}, assuming the uniform distribution of input keys. We show that the accuracy of conservative update strategy undergoes a phase transition, depending on the number of distinct keys in the input as a fraction of the size of the Count-Min array. We prove that below the threshold, the relative error is asymptotically $o(1)$ (as opposed to the regular Count-Min strategy), whereas above the threshold, the relative error is $\Theta(1)$. The threshold corresponds to the peelability threshold of random $k$-uniform hypergraphs. We demonstrate that even for small number of keys, peelability of the underlying hypergraph is a crucial property to ensure the $o(1)$ error. Finally, we provide an experimental evidence that the phase transition does not extend to non-uniform distributions, in particular to the popular Zipf's distribution.
翻译:count- Min 草图是一个基于散列的数据结构, 代表着一个动态变化的组合式计数阵列。 我们在此根据一个称为\ textit{ 保守更新} 的更强更新规则来分析计数- Min 的计算版本, 假设输入键的分布一致。 我们显示, 保守更新战略的准确性将经历一个阶段过渡, 取决于输入中不同的密钥的数量, 与数- min 阵列的大小相比。 我们证明, 在阈值下, 相对的错误是暂时的1美元( 而不是常规的计数- Min 战略 ), 而高于阈值, 相对的错误是 $\ Theta(1)$。 阈值与随机的美元单式高光图的可切开阈值相对。 我们证明, 即使是少量的密钥, 基本高空图的可剥离性是保证$(1) 差的关键属性。 最后, 我们提供实验性证据表明, 阶段过渡不会延伸到非统一分布, 特别是流行的 Zipf 分布 。