Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only the latest $W$ data items are considered for a particular problem. The goal is to output a good solution at the end of the stream by maintaining a small summary during the stream. In this work, we propose a new algorithmic framework for designing efficient sliding window algorithms via bucketing-based sketches. Based on this new framework, we develop space-efficient sliding window algorithms for $k$-cover, $k$-clustering and diversity maximization problems. For each of the above problems, our algorithm achieves $(1\pm \varepsilon)$-approximation. Compared with the previous work, it improves both the approximation ratio and the space.
翻译:串流计算在大规模数据分析中起着重要作用 。 滑动窗口模型是一个流式计算模型, 也可以捕捉数据的耐受性 。 在这个模型中, 数据一次到达一个项目, 但只有最新的美元数据项目才考虑特定问题 。 目标是在流中保持一个小摘要, 在流中在流的尾端输出一个好的解决办法 。 在这项工作中, 我们提出一个新的算法框架, 用于通过以桶桶为基础的草图设计高效的滑动窗口算法 。 基于这个新框架, 我们为 $- 覆盖、 $- 组合和多样性最大化问题开发空间高效滑动窗口算法 。 对于上述每一个问题, 我们的算法都实现了$( 1\ pm \ varepsilon)- 匹配。 与先前的工作相比, 它改进了近似率和空间 。