Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programming model for clusters. This paper introduces OpenMP Cluster (OMPC), a task-parallel model that extends OpenMP for cluster programming. OMPC leverages OpenMP's offloading standard to distribute annotated regions of code across the nodes of a distributed system. To achieve that it hides MPI-based data distribution and load-balancing mechanisms behind OpenMP task dependencies. Given its compliance with OpenMP, OMPC allows applications to use the same programming model to exploit intra- and inter-node parallelism, thus simplifying the development process and maintenance. We evaluated OMPC using Task Bench, a synthetic benchmark focused on task parallelism, comparing its performance against other distributed runtimes. Experimental results show that OMPC can deliver up to 1.53x and 2.43x better performance than Charm++ on CCR and scalability experiments, respectively. Experiments also show that OMPC performance weakly scales for both Task Bench and a real-world seismic imaging application.
翻译:尽管采取了各种研究举措和拟议方案拟订模式,但高专组群集平行方案拟订的有效解决办法仍然取决于不同方案拟订模式(如OpenMP和MPI)、语言(如C++和CUDA)和专门运行时间(如Charm++和Legion)的复杂组合,另一方面,任务平行已证明是集群高效和无缝的编程模式。本文介绍了OpenMP Croup(OMPC),这是一个扩展开放MP的集群编程任务平行模式。OMPC利用OpenMP标准卸载标准,在分布系统的节点之间分发附加说明的代码区域。为了实现它隐藏基于MP的数据分配和负载平衡机制,将其隐藏在OpenMP任务依赖性之后。另一方面,任务平行化证明任务并行无缝,OMPC允许应用同样的编程模式来利用内部和本地的平行性,从而简化发展进程和维护。我们利用任务组群集评估了OMPC,一个侧重于任务平行的合成基准,将其业绩与其他分布时间对比。 实验结果显示CRMR和C的绩效显示,SBROMB 能够分别交付SB级和C。