Resource selection and task placement for distributed execution poses conceptual and implementation difficulties. Although resource selection and task placement are at the core of many tools and workflow systems, the methods are ad hoc rather than being based on models. Consequently, partial and non-interoperable implementations proliferate. We address both the conceptual and implementation difficulties by experimentally characterizing diverse modalities of resource selection and task placement. We compare the architectures and capabilities of two systems: the AIMES middleware and Swift workflow scripting language and runtime. We integrate these systems to enable the distributed execution of Swift workflows on Pilot-Jobs managed by the AIMES middleware. Our experiments characterize and compare alternative execution strategies by measuring the time to completion of heterogeneous uncoupled workloads executed at diverse scale and on multiple resources. We measure the adverse effects of pilot fragmentation and early binding of tasks to resources and the benefits of backfill scheduling across pilots on multiple resources. We then use this insight to execute a multi-stage workflow across five production-grade resources. We discuss the importance and implications for other tools and workflow systems.
翻译:虽然资源选择和任务安排是许多工具和工作流程系统的核心,但方法却是临时性的,而不是基于模式。因此,部分和非互操作性执行激增。我们通过实验性地确定资源选择和任务安排的不同模式来解决概念和执行方面的困难。我们比较了两个系统的架构和能力:AIMES中软件和Swift工作流程编稿语言和运行时间。我们将这些系统结合起来,以便能够在由AIMES中软件管理的试点-作业中执行Swift工作流程。我们的实验通过测量完成不同规模和多种资源完成的杂交、未混杂的工作量的时间来描述和比较备选执行战略。我们衡量试点分散和任务与资源早期捆绑在一起的不利影响,以及将多个资源纳入试点的优势。我们然后利用这种洞察力在五个生产级资源中执行多阶段工作流程。我们讨论了其他工具和工作流程系统的重要性和所涉问题。