In this paper, we present an advanced analysis of near optimal algorithms that use limited space to solve the frequency estimation, heavy hitters, frequent items, and top-k approximation in the bounded deletion model. We define the family of SpaceSaving$\pm$ algorithms and explain why the original SpaceSaving$\pm$ algorithm only works when insertions and deletions are not interleaved. Next, we propose the new Double SpaceSaving$\pm$, Unbiased Double SpaceSaving$\pm$, and Integrated SpaceSaving$\pm$ and prove their correctness. The three proposed algorithms represent different trade-offs, in which Double SpaceSaving$\pm$ can be extended to provide unbiased estimations while Integrated SpaceSaving$\pm$ uses less space. Since data streams are often skewed, we present an improved analysis of these algorithms and show that errors do not depend on the hot items. We also demonstrate how to achieve relative error guarantees under mild assumptions. Moreover, we establish that the important mergeability property is satisfied by all three algorithms, which is essential for running the algorithms in distributed settings.
翻译:暂无翻译