Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC because they allow full system utilization with relaxed synchronization requirements. There are so many special-purpose tools for task scheduling, one might wonder why more are needed. Use cases seen on the Summit supercomputer needed better integration with MPI and greater flexibility in job launch configurations. Preparation, execution, and analysis of computational chemistry simulations at the scale of tens of thousands of processors revealed three distinct workflow patterns. A separate job scheduler was implemented for each one using extremely simple and robust designs: file-based, task-list based, and bulk-synchronous. Comparing to existing methods shows unique benefits of this work, including simplicity of design, suitability for HPC centers, short startup time, and well-understood per-task overhead. All three new tools have been shown to scale to full utilization of Summit, and have been made publicly available with tests and documentation. This work presents a complete characterization of the minimum effective task granularity for efficient scheduler usage scenarios. These schedulers have the same bottlenecks, and hence similar task granularities as those reported for existing tools following comparable paradigms.
翻译:运行时间安排和工作流程系统是高常委会中日益流行的算法组成部分,因为它们允许系统充分使用,同时可以放松同步性要求。有许多特殊用途的任务时间安排工具,人们可能会想知道为什么需要更多。使用峰会超级计算机上看到的案例需要更好地与MPI整合,而且工作启动配置需要更大的灵活性。数万个处理器规模的计算化学模拟的准备、执行和分析揭示了三种不同的工作流程模式。对每个计算机都采用了一个单独的工作时间表,使用非常简单和稳健的设计:基于文件的、基于任务列表的和整体同步性。与现有方法的比较显示了这项工作的独特好处,包括设计简便、HPC中心的适宜性、启动时间短,以及充分理解每件任务设置。所有三个新工具都已经证明可以达到充分利用峰会的规模,并且已经通过测试和文件公开提供。这项工作对基于调度器高效使用情景的最低有效任务粒子性作了全面的描述。这些调度器具有同样的瓶颈,因此与根据可比较模式报告的现有工具一样具有类似的任务谷状。