We present HyperLogLogLog, a practical compression of the HyperLogLog sketch that compresses the sketch from $O(m\log\log n)$ bits down to $m \log_2\log_2\log_2 m + O(m+\log\log n)$ bits for estimating the number of distinct elements~$n$ using $m$~registers. The algorithm works as a drop-in replacement that preserves all estimation properties of the HyperLogLog sketch, it is possible to convert back and forth between the compressed and uncompressed representations, and the compressed sketch maintains mergeability in the compressed domain. The compressed sketch can be updated in amortized constant time, assuming $n$ is sufficiently larger than $m$. We provide a C++ implementation of the sketch, and show by experimental evaluation against well-known implementations by Google and Apache that our implementation provides small sketches while maintaining competitive update and merge times. Concretely, we observed approximately a 40% reduction in the sketch size. Furthermore, we obtain as a corollary a theoretical algorithm that compresses the sketch down to $m\log_2\log_2\log_2\log_2 m+O(m\log\log\log m/\log\log m+\log\log n)$ bits.
翻译:我们展示了超LogLogLogLog的超LogLog 素描, 将素描从$O( m\log\log n) bits 压缩到$$( log_ 2\log_ 2 m + O( log\ log n) bits), 用于估算不同元素的数量 ~ 美元 ~ 美元, 使用 ~ 注册者 。 算法作为一个滴入替换工具, 保存超LogLog素描的所有估计属性, 压缩和未压缩的面貌之间可以互换, 压缩的素描在压缩的域中保持合并性。 压缩的素描可以以折现常数时间更新, 假设美元大于 $( log_ log\ mlog\ m) 。 我们提供素描图的C+ 执行情况, 并用实验性评估显示, 我们的实施在保持竞争性更新和合并时提供了小的草图。 具体地说, 我们观察到了绘图大小约减少40% 。 此外, 我们得到了一个必然的理论算算法, 缩略图2\ praglog\ $_ 。