Apache Kafka addresses the general problem of delivering extreme high volume event data to diverse consumers via a publish-subscribe messaging system. It uses partitions to scale a topic across many brokers for producers to write data in parallel, and also to facilitate parallel reading of consumers. Even though Apache Kafka provides some out of the box optimizations, it does not strictly define how each topic shall be efficiently distributed into partitions. The well-formulated fine-tuning that is needed in order to improve an Apache Kafka cluster performance is still an open research problem. In this paper, we first model the Apache Kafka topic partitioning process for a given topic. Then, given the set of brokers, constraints and application requirements on throughput, OS load, replication latency and unavailability, we formulate the optimization problem of finding how many partitions are needed and show that it is computationally intractable, being an integer program. Furthermore, we propose two simple, yet efficient heuristics to solve the problem: the first tries to minimize and the second to maximize the number of brokers used in the cluster. Finally, we evaluate its performance via large-scale simulations, considering as benchmarks some Apache Kafka cluster configuration recommendations provided by Microsoft and Confluent. We demonstrate that, unlike the recommendations, the proposed heuristics respect the hard constraints on replication latency and perform better w.r.t. unavailability time and OS load, using the system resources in a more prudent way.
翻译:阿帕切卡卡夫卡(Apache Kafka) 解决通过出版订阅信息系统向不同消费者提供数量极高的极端事件数据这一普遍问题。 它使用分区, 将一个专题在众多经纪人中进行, 让生产商同时写数据, 并便利消费者的平行阅读。 尽管阿帕奇卡夫卡( Apache Kafka) 提供了一些盒子优化以外的部分, 但它并没有严格界定如何将每个专题有效分配到分区。 改进阿帕奇卡夫卡( Apache Kafka) 集群绩效所需的精心设计的微调仍然是一个开放的研究问题。 在本文中, 我们首先为某个特定主题模拟了阿帕奇卡夫卡( Apache Kafka) 主题的分流进程。 然后, 考虑到对输量、 OS 、 OS 负载、 复制 和 无法获取的 应用要求, 我们设计了最佳化问题, 找出需要多少分区, 并显示它是如何在计算上比较难的。 此外, 我们提出了两个简单但有效的超速的 解决问题的方法: 第一次尝试 和第二套装 。