Data pipelines are essential in stream processing as they enable the efficient collection, processing, and delivery of real-time data, supporting rapid data analysis. In this paper, we present AutoStreamPipe, a novel framework that employs Large Language Models (LLMs) to automate the design, generation, and deployment of stream processing pipelines. AutoStreamPipe bridges the semantic gap between high-level user intent and platform-specific implementations across distributed stream processing systems for structured multi-agent reasoning by integrating a Hypergraph of Thoughts (HGoT) as an extended version of GoT. AutoStreamPipe combines resilient execution strategies, advanced query analysis, and HGoT to deliver pipelines with good accuracy. Experimental evaluations on diverse pipelines demonstrate that AutoStreamPipe significantly reduces development time (x6.3) and error rates (x5.19), as measured by a novel Error-Free Score (EFS), compared to LLM code-generation methods.
翻译:数据管道在流处理中至关重要,其能够高效地收集、处理和传递实时数据,从而支持快速的数据分析。本文提出AutoStreamPipe,一种新颖的框架,它利用大语言模型来自动化流处理管道的设计、生成与部署。AutoStreamPipe通过集成思维超图作为GoT的扩展版本,弥合了高层次用户意图与跨分布式流处理系统的平台特定实现之间的语义鸿沟,以支持结构化的多智能体推理。AutoStreamPipe结合了弹性执行策略、高级查询分析和思维超图,以生成具有良好准确性的管道。在多样化管道上的实验评估表明,与基于大语言模型的代码生成方法相比,AutoStreamPipe显著缩短了开发时间(6.3倍)并降低了错误率(5.19倍),该结果通过一种新颖的无错误分数进行度量。