We give optimally fast $O(\log p)$ time (per processor) algorithms for computing round-optimal broadcast schedules for message-passing parallel computing systems. This affirmatively answers the questions posed in Tr\"aff (2022). The problem is to broadcast $n$ indivisible blocks of data from a given root processor to all other processors in a (subgraph of a) fully connected network of $p$ processors with fully bidirectional, one-ported communication capabilities. In this model, $n-1+\lceil\log_2 p\rceil$ communication rounds are required. Our new algorithms compute for each processor in the network receive and send schedules each of size $\lceil\log_2 p\rceil$ that determine uniquely in $O(1)$ time for each communication round the new block that the processor will receive, and the already received block it has to send. Schedule computations are done independently per processor without communication. The broadcast communication subgraph is the same, easily computable, directed, $\lceil\log_2 p\rceil$-regular circulant graph used in Tr\"aff (2022) and elsewhere. We show how the schedule computations can be done in optimal time and space of $O(\log p)$, improving significantly over previous results of $O(p\log^2 p)$ and $O(\log^3 p)$. The schedule computation and broadcast algorithms are simple to implement, but correctness and complexity are not obvious. All algorithms have been implemented, compared to previous algorithms, and briefly evaluated on a small $36\times 32$ processor-core cluster.
翻译:暂无翻译