Stream processing acceleration is driven by the continuously increasing volume and velocity of data generated on the Web and the limitations of storage, computation, and power consumption. Hardware solutions provide better performance and power consumption, but they are hindered by the high research and development costs and the long time to market. In this work, we propose our re-configurable stream processor (Diba), a complete rethinking of a previously proposed customized and flexible query processor that targets real-time stream processing. Diba uses a unidirectional dataflow not dedicated to any specific type of query (operator) on streams, allowing a straightforward placement of processing components on a general data path that facilitates query mapping. In Diba, the concepts of the distribution network and processing components are implemented as two separate entities connected using generic interfaces. This approach allows the adoption of a versatile architecture for a family of queries rather than forcing a rigid chain of processing components to implement such queries. Our experimental evaluations of representative queries from TPC-H yielded processing times of 300, 1220, and 3520 milliseconds for data streams with scale factor sizes of one, four, and ten gigabytes, respectively.
翻译:流处理加速是由Web上持续增加的数据量和速度以及存储、计算和功耗限制推动的。硬件解决方案提供了更好的性能和功耗,但受到高昂的研发成本和长时间市场更新的限制。在这项工作中,我们提出了可重构流处理器Diba,它是对先前提出的定制化和灵活的查询处理器的完全重新思考,针对实时流处理。Diba使用单向数据流,在流上不专门针对任何特定类型的查询(运算符),允许在通用数据路径上直接放置处理组件,从而便于查询映射。在Diba中,分发网络和处理组件的概念被实现为两个分别使用通用接口连接的实体。这种方法允许采用一个灵活的架构来处理一类查询,而不是强制使用刚性的处理组件链来实现这些查询。我们对TPC-H中的典型查询进行的实验评估,对于规模因子大小为1、4和10吉字节的数据流,处理时间分别为300、1220和3520毫秒。