Hybrid MPI+threads programming is gaining prominence, but, in practice, applications perform slower with it compared to the MPI everywhere model. The most critical challenge to the parallel efficiency of MPI+threads applications is slow MPI_THREAD_MULTIPLE performance. MPI libraries have recently made significant strides on this front, but to exploit their capabilities, users must expose the communication parallelism in their MPI+threads applications. Recent studies show that MPI 4.0 provides users with new performance-oriented options to do so, but our evaluation of these new mechanisms shows that they pose several challenges. An alternative design is MPI Endpoints. In this paper, we present a comparison of the different designs from the perspective of MPI's end-users: domain scientists and application developers. We evaluate the mechanisms on metrics beyond performance such as usability, scope, and portability. Based on the lessons learned, we make a case for a future direction.
翻译:MPI+Thoread 编程越来越突出, 但实际上, 应用程序的效绩比MPI 的全方位模型要慢。 对 MPI+thread 应用程序平行效率的最关键挑战在于 MPI_THREAD_MulturePLE 表现缓慢。 MPI 图书馆最近在这方面取得了长足的进步, 但为了开发它们的能力, 用户必须在其 MPI+thread 应用程序中暴露通信平行性。 最近的研究显示 MPI 4. 0 为用户提供了新的面向业绩的选项, 但我们对这些新机制的评估表明它们构成了若干挑战。 替代设计是 MPI 终点 。 在本文中, 我们从MPI 终端用户的角度比较了不同的设计: 域科学家和应用开发者 。 我们评估了功能, 例如可用性、 范围 和 可移植性。 根据所学到的教益, 我们论证未来方向 。