Modern distributed systems demand low-latency, fault-tolerant event processing that exceeds traditional messaging architecture limits. While frameworks including Apache Kafka, RabbitMQ, Apache Pulsar, NATS JetStream, and serverless event buses have matured significantly, no unified comparative study evaluates them holistically under standardized conditions. This paper presents the first comprehensive benchmarking framework evaluating 12 messaging systems across three representative workloads: e-commerce transactions, IoT telemetry ingestion, and AI inference pipelines. We introduce AIEO (AI-Enhanced Event Orchestration), employing machine learning-driven predictive scaling, reinforcement learning for dynamic resource allocation, and multi-objective optimization. Our evaluation reveals fundamental trade-offs: Apache Kafka achieves peak throughput (1.2M messages/sec, 18ms p95 latency) but requires substantial operational expertise; Apache Pulsar provides balanced performance (950K messages/sec, 22ms p95) with superior multi-tenancy; serverless solutions offer elastic scaling for variable workloads despite higher baseline latency (80-120ms p95). AIEO demonstrates 34\% average latency reduction, 28\% resource utilization improvement, and 42% cost optimization across all platforms. We contribute standardized benchmarking methodologies, open-source intelligent orchestration, and evidence-based decision guidelines. The evaluation encompasses 2,400+ experimental configurations with rigorous statistical analysis, providing comprehensive performance characterization and establishing foundations for next-generation distributed system design.
翻译:现代分布式系统要求低延迟、容错的事件处理能力,这已超出传统消息架构的极限。尽管包括Apache Kafka、RabbitMQ、Apache Pulsar、NATS JetStream和无服务器事件总线在内的框架已显著成熟,但尚无统一的比较研究在标准化条件下对它们进行全面评估。本文提出了首个综合性基准测试框架,针对三种代表性工作负载评估了12种消息系统:电子商务交易、物联网遥测数据摄取和AI推理流水线。我们引入了AIEO(AI增强型事件编排),它采用机器学习驱动的预测性扩展、用于动态资源分配的强化学习以及多目标优化。我们的评估揭示了根本性的权衡:Apache Kafka实现了峰值吞吐量(120万条消息/秒,p95延迟18毫秒),但需要大量运维专业知识;Apache Pulsar提供了均衡的性能(95万条消息/秒,p95延迟22毫秒)和卓越的多租户支持;无服务器解决方案为可变工作负载提供了弹性扩展能力,尽管其基线延迟较高(p95延迟80-120毫秒)。AIEO在所有平台上平均实现了34%的延迟降低、28%的资源利用率提升和42%的成本优化。我们贡献了标准化的基准测试方法、开源智能编排工具以及基于证据的决策指南。该评估涵盖了2400多种实验配置,并进行了严格的统计分析,提供了全面的性能特征描述,为下一代分布式系统设计奠定了基础。