Event-driven microservices are an emerging architectural style for data-intensive software systems. In such systems, stream processing frameworks such as Apache Flink, Apache Kafka Streams, Apache Samza, Hazelcast Jet, or the Apache Beam SDK are used inside microservices to continuously process massive amounts of data in a distributed fashion. While all of these frameworks promote scalability as a core feature, there is only little empirical research evaluating and comparing their scalability. In this study, we benchmark five modern stream processing frameworks regarding their scalability using a systematic method. We conduct over 460~hours of experiments on Kubernetes clusters in the Google cloud and in a private cloud, where we deploy up to 110 simultaneously running microservice instances, which process up to one million messages per second. We find that all benchmarked frameworks exhibit approximately linear scalability as long as sufficient cloud resources are provisioned. However, the frameworks show considerable differences in the rate at which resources have to be added to cope with increasing load. Moreover, we observe that there is no clear superior framework, but the ranking of the frameworks depends on the use case. Using Apache Beam as an abstraction layer still comes at the cost of significantly higher resource requirements regardless of the use case.
翻译:事件驱动微服务是数据密集型软件系统的一种新兴架构风格。在这种系统中,流处理框架,如Apache Flink,Apache Kafka Streams,Apache Samza,Hazelcast Jet或Apache Beam SDK在微服务内部使用,以分布式方式连续处理大量数据。虽然所有这些框架都将可伸缩性作为核心功能推广,但只有很少的实证研究评估和比较它们的可伸缩性。在这项研究中,我们使用系统方法对五个现代流处理框架进行可伸缩性基准测试。我们在Google云和私有云中的Kubernetes集群上进行超过460个小时的实验,在其中部署高达110个同时运行的微服务实例,每秒处理高达100万条消息。我们发现只要提供足够的云资源,所有基准测试的框架都呈现出近似线性的可伸缩性。然而,这些框架在应付不断增加的负载时显示出相当大的差异。此外,我们观察到没有明显的优秀框架,但框架的排名取决于用例。在使用Apache Beam作为抽象层的情况下,仍需要付出显着更高的资源需求,而不论用例如何。