This paper presents a benchmark of stream processing throughput comparing Apache Spark Streaming (under file-, socket- and Kafka-based stream integration), with a prototype P2P stream processing framework, HarmonicIO. Maximum throughput for broad range of stream processing loads are measured, in particular, those with large message sizes (up to 10MB), and heavy CPU load -- loads more typical of scientific computing use cases (such as microscopy), than enterprise contexts. A detailed exploration of the performance characteristics of these integrations under varying loads reveals a complex interplay of performance trade-offs, uncovering the boundaries of good performance for each framework and integration. Based on these results, we suggest which frameworks and integrations are likely to offer good performance for a given load. Broadly, the advantages of Spark's rich feature set comes at a cost of sensitivity to message size in particular, whereas the simplicity of HarmonicIO offers more robust performance, especially for raw CPU utilization.
翻译:本文介绍了将Apache Spark Streaming(在文件、套接和基于Kafka的流流集成下)与原型P2P流处理框架 " 和谐组织 " 比较的溪流处理输送量基准。测量了广泛的溪流处理负荷的最大输送量,特别是信息大小大(高达10MB)和重的CPU负荷 -- -- 科学计算使用案例(如显微镜)比企业环境更典型的负荷。详细探讨不同负荷下这些集成的性能特点,揭示了业绩权衡的复杂相互作用,揭示了每个框架和集成的良好性能界限。基于这些结果,我们建议哪些框架和集成有可能为特定负荷提供良好的性能。广而言,Spark的丰富功能组合的优势是以对信息大小特别敏感为代价的,而HarmonicIO的简单化提供了更强有力的性能,特别是用于原始CPU的利用。