Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB's principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB's overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.
翻译:不同领域的复杂科学实验通常以工作流程为模型,使用平行工作流程管理系统(WMS)在大型机器上进行。由于此类处决通常持续数小时或数天,有些WMS提供用户指导支持,即用户可以进行数据分析,并根据结果在运行时调整工作流程。平行执行控制设计的一个挑战是管理工作流程数据,以便高效执行,同时提供用户指导支持。高可缩放性的数据一般以交易为导向,而数据分析则以在线分析为导向,从而管理此类混合工作量更难应对挑战。在这项工作中,我们介绍SchalaDB,这是一套设计原则和技术的架构,其设计基础是分发的模拟数据管理,用于高效工作流程执行控制和用户指导。我们提出一个分散的数据设计,用于可缩放的工作流程安排和高可用性,同时提供用户指导支持用户指导支持。为了评估我们的提议,我们开发了D-Chiron,这是根据SchalaDB原则设计的平行工作流程,因此更加困难重重。我们甚至对用户的工作流程进行了广泛的实验性评价,在进行数据管理过程中,我们用Sch-DB进行数据分析时,我们进行了数百个核心数据分析。