Task-based programming models have risen in popularity as an alternative to traditional fork-join parallelism. They are better suited to write applications with irregular parallelism that can present load imbalance. However, these programming models suffer from overheads related to task creation, scheduling and dependency management, limiting performance and scalability when tasks become too small. At the same time, many HPC applications implement iterative methods or multi-step simulations that create the same directed acyclic graphs of tasks on each iteration. By giving application programmers a way to express that a specific loop is creating the same task pattern on each iteration, we can create a single task DAG once and transform it into a cyclic graph. This cyclic graph is then reused for successive iterations, minimizing task creation and dependency management overhead. This paper presents the taskiter, a new construct we propose for the OmpSs-2 and OpenMP programming models, allowing the use of directed cyclic task graphs (DCTG) to minimize runtime overheads. Moreover, we present a simple immediate successor locality-aware heuristic that minimizes task scheduling overhead by bypassing the runtime task scheduler. We evaluate the implementation of the taskiter and the immediate successor heuristic in 8 iterative benchmarks. Using small task granularities, we obtain an average speedup of 3.7x over the reference OmpSs-2 implementation and an average of 5x and 7.46x speedup over the LLVM and GCC OpenMP runtimes, respectively.
翻译:以任务为基础的编程模型作为传统的叉和叉和交叉平行主义的替代方法,越来越受欢迎,更适合以非常规平行方式编写可显示负荷不平衡的应用程序;然而,这些编程模型受到与任务创建、时间安排和依赖管理有关的间接费用的影响,在任务规模过小时限制了性能和可缩放性;同时,许多高常委会应用程序采用迭代方法或多步模拟,在每次迭代上生成相同的定向周期性任务图;让应用程序程序员能够表达具体循环正在为每次迭代创建相同的任务模式,我们就可以一次性创建单一任务DAG并将其转换为循环图;该循环图随后被再用于连续的迭代,尽量减少任务创建和依赖管理管理管理间接费用;本文介绍了任务标准,我们为OmpS-2和OpenMP编程模型建议了一个新的结构,以便使用定向周期性任务图(DCTG)来最大限度地减少运行时的间接费用;此外,我们展示了一个简单的后继地点感参照系统,以便分别通过连续的Lex-II任务进度表和S-Slassimal标准来尽量减少执行任务进度。