Scaling global aggregations is a challenge for exactly-once stream processing systems. Current systems implement these either by computing the aggregation in a single task instance, or by static aggregation trees, which limits scalability and may become a bottleneck. Moreover, the end-to-end latency is determined by the slowest path in the tree, and failures and reconfiguration cause large latency spikes due to the centralized coordination. Towards these issues, we present Holon Streaming, an exactly-once stream processing system for global aggregations. Its deterministic programming model uses windowed conflict-free replicated data types (Windowed CRDTs), a novel abstraction for shared replicated state. Windowed CRDTs make computing global aggregations scalable. Furthermore, their guarantees such as determinism and convergence enable the design of efficient failure recovery algorithms by decentralized coordination. Our evaluation shows a 5x lower latency and 2x higher throughput than an existing stream processing system on global aggregation workloads, with an 11x latency reduction under failure scenarios. The paper demonstrates the effectiveness of decentralized coordination with determinism, and the utility of Windowed CRDTs for global aggregations.
翻译:暂无翻译