Future Exascale systems will feature massive parallelism, many-core processors and heterogeneous architectures. In this scenario, it is increasingly difficult for HPC applications to fully and efficiently utilize the resources in system nodes. Moreover, the increased parallelism exacerbates the effects of existing inefficiencies in current applications. Research has shown that co-scheduling applications to share system nodes instead of executing each application exclusively can increase resource utilization and efficiency. Nevertheless, the current oversubscription and co-location techniques to share nodes have several drawbacks which limit their applicability and make them very application-dependent. This paper presents co-execution through system-wide scheduling. Co-execution is a novel fine-grained technique to execute multiple HPC applications simultaneously on the same node, outperforming current state-of-the-art approaches. We implement this technique in nOS-V, a lightweight tasking library that supports co-execution through system-wide task scheduling. Moreover, nOS-V can be easily integrated with existing programming models, requiring no changes to user applications. We showcase how co-execution with nOS-V significantly reduces schedule makespan for several applications on single node and distributed environments, outperforming prior node-sharing techniques.
翻译:未来外部系统将具有巨大的平行性、许多核心处理器和多种结构。 在这种假设中,HPC应用越来越难以充分和高效地利用系统节点中的资源。 此外,增加的平行性加剧了当前应用中现有效率低下的影响。研究显示,共同安排共享系统节点而不是完全执行每个应用的系统节点应用程序可以提高资源利用率和效率。然而,目前共享节点的超额订阅和合用同一地点技术有若干缺陷,限制了其适用性,使其非常依赖应用程序。本文通过全系统的时间安排展示了共同执行。共同执行是一种新型精细细技术,可以同时在同一节点执行多个HPC应用,比目前的最新方法要强。我们在NOS-V中采用这种技术,一个轻量的任务分配库,通过全系统的任务时间安排支持共同执行。此外,NOS-V可以很容易地与现有的程序模式结合,不需要对用户应用程序作任何改动。我们展示了如何同时执行的同步执行环境,与先前的NOS-DS-DRES-S-S-trade duste acrestru press reduction spress production production production apress on produstress on produstress on produstress on produstress