Data-intensive scientific workflows increasingly rely on high-performance computing (HPC) systems, complementing traditional Grid and Cloud platforms. However, workflow scheduling on HPC infrastructures remains challenging due to the prevalence of non-uniform memory access (NUMA) architectures. These systems require schedulers to account for data locality not only across distributed environments but also within each node. Modern HPC nodes integrate multiple NUMA domains and heterogeneous memory regions, such as high-bandwidth memory (HBM) and DRAM, and frequently attach accelerators (GPUs or FPGAs) and network interface cards (NICs) to specific NUMA nodes. This design increases the variability of data-access latency and complicates the placement of both tasks and data. Despite these constraints, most workflow scheduling strategies were originally developed for Grid or Cloud environments and rarely incorporate NUMA-aware considerations. To address this gap, this work introduces nFlows, a NUMA-aware Workflow Execution Runtime System that enables the modeling, bare-metal execution, simulation, and validation of scheduling algorithms for data-intensive workflows on NUMA-based HPC systems. The system's design, implementation, and validation methodology are presented. nFlows supports the construction of simulation models and their direct execution on physical systems, enabling studies of NUMA effects on scheduling, the design of NUMA-aware algorithms, the analysis of data-movement behavior, the identification of performance bottlenecks, and the exploration of in-memory workflow execution.
翻译:暂无翻译