Scaling global aggregations is a challenge for exactly-once stream processing systems. Current systems implement these either by computing the aggregation in a single task instance, or by static aggregation trees, which limits scalability and may become a bottleneck. Moreover, the end-to-end latency is determined by the slowest path in the tree, and failures and reconfiguration cause large latency spikes due to the centralized coordination. Towards these issues, we present Holon Streaming, an exactly-once stream processing system for global aggregations. Its deterministic programming model uses windowed conflict-free replicated data types (Windowed CRDTs), a novel abstraction for shared replicated state. Windowed CRDTs make computing global aggregations scalable. Furthermore, their guarantees such as determinism and convergence enable the design of efficient failure recovery algorithms by decentralized coordination. Our evaluation shows a 5x lower latency and 2x higher throughput than an existing stream processing system on global aggregation workloads, with an 11x latency reduction under failure scenarios. The paper demonstrates the effectiveness of decentralized coordination with determinism, and the utility of Windowed CRDTs for global aggregations.
翻译:在精确一次流处理系统中,扩展全局聚合能力面临显著挑战。现有系统通常通过单一任务实例执行聚合计算,或采用静态聚合树结构,这些方法限制了可扩展性并可能成为性能瓶颈。此外,端到端延迟取决于聚合树中最慢路径的耗时,而集中式协调机制下的故障与重新配置会导致严重的延迟尖峰。针对这些问题,本文提出Holon Streaming——一种面向全局聚合的精确一次流处理系统。其确定性编程模型采用窗口化无冲突复制数据类型(Windowed CRDTs),这是一种用于共享复制状态的全新抽象。Windowed CRDTs使全局聚合计算具备可扩展性,其确定性、收敛性等特性支持通过去中心化协调设计高效故障恢复算法。实验评估表明,在全局聚合工作负载中,本系统相较于现有流处理系统实现延迟降低5倍、吞吐量提升2倍,故障场景下延迟减少达11倍。本文论证了确定性去中心化协调机制的有效性,并展示了Windowed CRDTs在全局聚合场景中的实用价值。