三个 " 简单最大平行主义 " 实际工作流程计划 (Three Practical Workflow Schedulers for Easy Maximum Parallelism)

Runtime scheduling and workflow systems are an increasingly popular algorithmic component in HPC because they allow full system utilization with relaxed synchronization requirements. There are so many special-purpose tools for task scheduling, one might wonder why more are needed. Use cases seen on the Summit supercomputer needed better integration with MPI and greater flexibility in job launch configurations. Preparation, execution, and analysis of computational chemistry simulations at the scale of tens of thousands of processors revealed three distinct workflow patterns. A separate job scheduler was implemented for each one using extremely simple and robust designs: file-based, task-list based, and bulk-synchronous. Comparing to existing methods shows unique benefits of this work, including simplicity of design, suitability for HPC centers, short startup time, and well-understood per-task overhead. All three new tools have been shown to scale to full utilization of Summit, and have been made publicly available with tests and documentation. This work presents a complete characterization of the minimum effective task granularity for efficient scheduler usage scenarios. These schedulers have the same bottlenecks, and hence similar task granularities as those reported for existing tools following comparable paradigms.

翻译：运行时间安排和工作流程系统是高常委会中日益流行的算法组成部分,因为它们允许系统充分使用,同时可以放松同步性要求。有许多特殊用途的任务时间安排工具,人们可能会想知道为什么需要更多。使用峰会超级计算机上看到的案例需要更好地与MPI整合,而且工作启动配置需要更大的灵活性。数万个处理器规模的计算化学模拟的准备、执行和分析揭示了三种不同的工作流程模式。对每个计算机都采用了一个单独的工作时间表,使用非常简单和稳健的设计:基于文件的、基于任务列表的和整体同步性。与现有方法的比较显示了这项工作的独特好处,包括设计简便、HPC中心的适宜性、启动时间短,以及充分理解每件任务设置。所有三个新工具都已经证明可以达到充分利用峰会的规模,并且已经通过测试和文件公开提供。这项工作对基于调度器高效使用情景的最低有效任务粒子性作了全面的描述。这些调度器具有同样的瓶颈,因此与根据可比较模式报告的现有工具一样具有类似的任务谷状。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【ETH】最新《几何数据分析》2020课程，附PPT下载

专知会员服务

44+阅读 · 2020年12月18日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日