Heterogeneous scientific workflows consist of numerous types of tasks and dependencies between them. Middleware capable of scheduling and submitting different task types across heterogeneous platforms must permit asynchronous execution of tasks for improved resource utilization, task throughput, and reduced makespan. In this paper we present an analysis of an important class of heterogeneous workflows, viz., AI-driven HPC workflows, to investigate asynchronous task execution requirements and properties. We model the degree of asynchronicity permitted for arbitrary workflows, and propose key metrics that can be used to determine qualitative benefits when employing asynchronous execution. Our experiments represent important scientific drivers, are performed at scale on Summit, and performance enhancements due to asynchronous execution are consistent with our model.
翻译:不同科学工作流程由多种任务和相互依存性组成。 能够在不同平台安排和提交不同任务类型的中件必须允许不同步地执行改进资源利用、任务输送量和减少月度的工作。 在本文件中,我们分析了一组重要的不同工作流程,即AI驱动的HPC工作流程,以调查不同步的任务执行要求和属性。 我们以任意工作流程允许的不同步程度为模型,并提出了在使用非同步执行时可用于确定质量效益的关键指标。 我们的实验代表重要的科学驱动因素,在峰会上进行,由于不同步执行而提高绩效与我们的模型一致。