Existing serverless data analytics systems rely on external storage services like S3 for data shuffling and communication between cloud functions. While this approach provides the elasticity benefits of serverless computing, it incurs additional latency and cost overheads. We present Flock, a novel cloud-native streaming query engine that leverages the on-demand scalability of FaaS platforms for real-time data analytics. Flock utilizes function invocation payloads for efficient data exchange, eliminating the need for external storage. This not only reduces latency and cost but also simplifies the architecture by removing the requirement for a centralized coordinator. Flock employs a template-based approach to dynamically create cloud functions for each query stage and a function group mechanism for handling data aggregation and shuffling. It supports both SQL and DataFrame APIs, making it easy to use. Our evaluation shows that Flock provides significant performance gains and cost savings compared to existing serverless and serverful streaming systems. It outperforms Apache Flink by 10-20x in cost while achieving similar latency and throughput.
翻译:暂无翻译