Efficient data streaming is essential for real-time data analytics, visualization, and machine learning model training, particularly when dealing with high-volume datasets. Various streaming technologies and serialization protocols have been developed to cater to different streaming requirements, each performing differently depending on specific tasks and datasets involved. This variety poses challenges in selecting the most appropriate combination, as encountered during the implementation of streaming system for the MAST fusion device data or SKA's radio astronomy data. To address this challenge, we conducted an empirical study on widely used data streaming technologies and serialization protocols. We also developed an extensible, open-source software framework to benchmark their efficiency across various performance metrics. Our study uncovers significant performance differences and trade-offs between these technologies, providing valuable insights that can guide the selection of optimal streaming and serialization solutions for modern data-intensive applications. Our goal is to equip the scientific community and industry professionals with the knowledge needed to enhance data streaming efficiency for improved data utilization and real-time analysis.
翻译:暂无翻译