Task-based programming models like OmpSs-2 and OpenMP provide a flexible data-flow execution model to exploit dynamic, irregular and nested parallelism. Providing an efficient implementation that scales well with small granularity tasks remains a challenge, and bottlenecks can manifest in several runtime components. In this paper, we analyze the limiting factors in the scalability of a task-based runtime system and propose individual solutions for each of the challenges, including a wait-free dependency system and a novel scalable scheduler design based on delegation. We evaluate how the optimizations impact the overall performance of the runtime, both individually and in combination. We also compare the resulting runtime against state of the art OpenMP implementations, showing equivalent or better performance, especially for fine-grained tasks.
翻译:基于任务的编程模式,如OmpSS-2和OpenMP,提供了灵活的数据流执行模式,以利用动态、非常规和嵌套的平行模式。提供高效率的执行,使小型颗粒性任务得到很好的规模,这仍然是一项挑战,瓶颈可在几个运行阶段显现出来。在本文件中,我们分析了基于任务的运行时间系统的可缩放性方面的限制因素,并为每一项挑战提出了个别的解决办法,包括无等待依赖系统和基于授权的新颖的可缩放的排程设计。我们评估了优化如何影响运行时间的总体性能,无论是单独还是组合。我们还比较了由此产生的运行时间与基于任务的运行时间相比,显示相当或更好的业绩,特别是细微任务。