We give a fast(er), communication-free, parallel construction of optimal communication schedules that allow broadcasting of $n$ distinct blocks of data from a root processor to all other processors in $1$-ported, $p$-processor networks with fully bidirectional communication. For any $p$ and $n$, broadcasting in this model requires $n-1+\lceil\log_2 p\rceil$ communication rounds. In contrast to other constructions, all processors follow the same, circulant graph communication pattern, which makes it possible to use the schedules for the allgather (all-to-all-broadcast) operation as well. The new construction takes $O(\log^3 p)$ time steps per processor, each of which can compute its part of the schedule independently of the other processors in $O(\log p)$ space. The result is a significant improvement over the sequential $O(p \log^2 p)$ time and $O(p\log p)$ space construction of Tr\"aff and Ripke (2009) with considerable practical import. The round-optimal schedule construction is then used to implement communication optimal algorithms the broadcast and (irregular) allgather collective operations as found in MPI (the \emph{Message-Passing Interface}), and significantly and practically improve over the implementations in standard MPI libraries (\texttt{mpich}, OpenMPI, Intel MPI) for certain problem ranges. The application to the irregular allgather operation is entirely new.
翻译:我们提供快速( er) 、 无通信2 p\ rceil= 2 p\ rceil 通讯周期。 与其他构造相比, 所有处理器都遵循相同的、 ircurant 图形通信模式, 从而可以使用全网( 全部到全网) 操作和所有其它处理器的运行时间表。 新建的处理器需要美元( log_ 3 p), 完全双向通信的处理器网络。 对于任何美元和 美元, 以此模式广播需要 $-1\ lceil\ log_ 2 p\ rcseil_ press 通信周期。 与其它构造相比, 所有处理器都遵循相同的模式, 循环图形( p\ log2 p) 时间和 美元( p\ log p) 的通信模式, 从而可以使用全网格( 全部全网格) 和 Ripke 的运行时间表。 新建的运行需要 $O( log3 p) 时间( commissional- commal) commal commal) commal- transal_ commal_al_ transal_ AS transal