Collective algorithms are an essential part of MPI, allowing application programmers to utilize underlying optimizations of common distributed operations. The MPI_Allgather gathers data, which is originally distributed across all processes, so that all data is available to each process. For small data sizes, the Bruck algorithm is commonly implemented to minimize the maximum number of messages communicated by any process. However, the cost of each step of communication is dependent upon the relative locations of source and destination processes, with non-local messages, such as inter-node, significantly more costly than local messages, such as intra-node. This paper optimizes the Bruck algorithm with locality-awareness, minimizing the number and size of non-local messages to improve performance and scalability of the allgather operation
翻译:集体算法是MPI的一个基本部分,使应用程序程序程序程序程序程序程序员能够利用共同分布作业的基本优化。 MPI_ Allgather 收集数据,最初在所有流程中进行分配,以便每个流程都能获得所有数据。 对于小数据大小,通常采用布鲁克算法,以最大限度地减少任何流程所传送信息的最大数量。然而,每个通信步骤的成本取决于源和目的地流程的相对位置,非本地信息,如内节点,费用大大高于本地信息,如节点内信息。 本文优化布鲁克算法,提高地点意识,最大限度地减少非本地信息的数量和规模,以提高所有组合操作的性能和可扩展性。