One of the most important issues in data stream processing systems is to use operator migration to handle highly variable workloads in a cost-efficient manner and adapt to the needs at any given time on demand. Operator migration is a complex process that involves changes in the state and stream management of a running query, typically without any loss of data, and with as little disruption to the execution as possible. This survey provides an overview of solutions for operator migration from a historical perspective as well as the perspective of the goal of migration. It introduces a conceptual model of operator migration to establish a unified terminology and classify existing solutions. Existing work in the area is analyzed to separate the mechanism of migration from the decision to migrate the data. In case of the latter, a cost-benefit analysis is emphasized that is important for operator migration but is often only implicitly addressed, or is neglected altogether. A description of the available solutions provides the reader with a good understanding of the design alternatives from an algorithmic viewpoint. We complement this with an empirical study to provide quantitative insights on the impact of different design alternatives on the mechanisms of migration.
翻译:数据流处理系统最重要的问题之一是利用经营者移徙来以具有成本效益的方式处理高度可变的工作量,并适应需求时在任何特定时间的需要。经营者移徙是一个复杂的过程,涉及对运行中的查询的状态和流管理的变化,通常不会丢失任何数据,而且尽可能少地干扰执行过程。这项调查从历史角度以及从移徙目标的角度概述了经营者移徙的解决办法。它提出了一个经营者移徙的概念模型,以建立统一的术语和对现有解决办法进行分类。对这一领域的现有工作进行了分析,以便将移徙机制与迁移数据的决定区分开来。如果是后者,则强调成本效益分析对于经营者移徙十分重要,但往往只是暗中处理,或完全忽视。对现有解决办法的说明使读者从算法的角度很好地了解了其他设计办法。我们用经验性研究来补充这一研究,以便从数量上深入了解不同设计备选办法对移徙机制的影响。