可用于可缩缩和弹性流处理的 " 虚拟共享无 " 平行主义 (STRETCH: Virtual Shared-Nothing Parallelism for Scalable and Elastic Stream Processing)

Stream processing applications extract value from raw data through Directed Acyclic Graphs of data analysis tasks. Shared-nothing (SN) parallelism is the de-facto standard to scale stream processing applications. Given an application, SN parallelism instantiates several copies of each analysis task, making each instance responsible for a dedicated portion of the overall analysis, and relies on dedicated queues to exchange data among connected instances. On the one hand, SN parallelism can scale the execution of applications both up and out since threads can run task instances within and across processes/nodes. On the other hand, its lack of sharing can cause unnecessary overheads and hinder the scaling up when threads operate on data that could be jointly accessed in shared memory. This trade-off motivated us in studying a way for stream processing applications to leverage shared memory and boost the scale up (before the scale out) while adhering to the widely-adopted and SN-based APIs for stream processing applications. We introduce STRETCH, a framework that maximizes the scale up and offers instantaneous elastic reconfigurations (without state transfer) for stream processing applications. We propose the concept of Virtual Shared-Nothing (VSN) parallelism and elasticity and provide formal definitions and correctness proofs for the semantics of the analysis tasks supported by STRETCH, showing they extend the ones found in common Stream Processing Engines. We also provide a fully implemented prototype and show that STRETCH's performance exceeds that of state-of-the-art frameworks such as Apache Flink and offers, to the best of our knowledge, unprecedented ultra-fast reconfigurations, taking less than 40 ms even when provisioning tens of new task instances.

翻译： Stream 处理应用程序通过数据分析任务的直接周期性图表从原始数据中提取值。共享( SN) 平行( SN) 是缩小流处理应用程序的脱法标准。在应用中, SN 平行( Sn) 即刻复制了每个分析任务的若干副本, 使得每个实例都负责整个分析中的专门部分, 并依靠专门的队列来交换连接实例的数据。一方面, SN 平行( SN) 可以扩大应用程序的执行范围, 因为线条可以在进程/ 节点内部和之间运行任务。另一方面, 共享( SNN) 平行( SNN) 的平行( SNN) 的平行( SN) 可能会导致不必要的间接费用, 并且当线条运行在共享记忆中可以共同获取的数据时, 阻碍扩大线条操作的线条操作范围。这种交易激励我们研究流处理应用的方法可以利用共享的记忆, 提升( 在升级之前) 坚持广泛采用基于 SNW 的 APP 。我们引入一个框架, 这个框架, 提供最大幅度的升级和瞬间弹性重组(没有国家转移) 平流处理应用程序的升级( ),我们提供最高级的快速的快速的版本的版本的版本( ) 的版本的版本的版本的版本的版本的功能),我们提供这样的版本的版本的版本的版本化( 的版本化) 的版本化(SNSNVI) 的版本化), 定义的功能的功能可以提供更小的版本化分析,,, 的版本化的功能可以提供更小的功能化的版本化的版本化的功能化的版本化的功能化的功能化的功能化的功能可以提供更小的模型化的功能化的功能化的功能化的功能化的模型化的功能的功能化的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能化的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的功能的更小。