Growing data volumes and velocities in fields such as Industry 4.0 or the Internet of Things have led to the increased popularity of data stream processing systems. Enterprises can leverage these developments by enriching their core business data and analyses with up-to-date streaming data. Comparing streaming architectures for these complex use cases is challenging, as existing benchmarks do not cover them. ESPBench is a new enterprise stream processing benchmark that fills this gap. We present its architecture, the benchmarking process, and the query workload. We employ ESPBench on three state-of-the-art stream processing systems, Apache Spark, Apache Flink, and Hazelcast Jet, using provided query implementations developed with Apache Beam. Our results highlight the need for the provided ESPBench toolkit that supports benchmark execution, as it enables query result validation and objective latency measures.
翻译:工业4.0或物联网等领域不断增长的数据量和速度导致数据流处理系统越来越受欢迎。企业可以通过利用最新流数据来丰富核心商业数据和分析,利用最新的流数据来利用这些发展。比较这些复杂使用案例的流结构具有挑战性,因为现有的基准无法覆盖这些复杂使用案例。ESPBench是一个新的企业流处理基准,可以填补这一空白。我们展示了其结构、基准进程和查询工作量。我们在三个最先进的流处理系统,即Apache Spark、Apache Flink和Hazelcast Jet上采用了ESPBench,利用了与Apache Beam一起开发的查询实施。我们的结果突出表明,需要提供ESP Bench工具包,以支持基准执行,因为它有助于查询结果验证和客观定位措施。