All-gather collective communication is one of the most important communication primitives in parallel and distributed computation, which plays an essential role in many HPC applications such as distributed Deep Learning (DL) with model and hybrid parallelism. To solve the communication bottleneck of All-gather, optical interconnection network can provide unprecedented high bandwidth and reliability for data transfer among the distributed nodes. However, most traditional All-gather algorithms are designed for electrical interconnection, which cannot fit well for optical interconnect systems, resulting in poor performance. This paper proposes an efficient scheme, called OpTree, for All-gather operation on optical interconnect systems. OpTree derives an optimal $m$-ary tree corresponding to the optimal number of communication stages, achieving minimum communication time. We further analyze and compare the communication steps of OpTree with existing All-gather algorithms. Theoretical results exhibit that OpTree requires much less number of communication steps than existing All-gather algorithms on optical interconnect systems. Simulation results show that OpTree can reduce communication time by 72.21%, 94.30%, and 88.58%, respectively, compared with three existing All-gather schemes, WRHT, Ring, and NE.
翻译:光学互联网络可以提供前所未有的高带宽和可靠性,用于分布式节点之间的数据传输。然而,大多数传统的全层共享算法是为电子互连设计的,不能很好地适用于光学互连系统,造成不良性能。本文建议了一个有效的方案,称为OpTree,用于光学互连系统全层互连操作。OpTree产生一棵与通信阶段的最佳数目相对应的最优一棵百万美的树,实现最小的通信时间。我们进一步分析和比较OpTree的通信步骤和现有的全层互连算法。理论结果显示,OpTree需要的通信步骤比光学互连系统现有的全层互连通算法少得多。模拟结果表明,OpTree可以将通信时间分别减少72.21%、94.30%和88.58%,而RY3、RingRY、RYL和RY 3个RY-RY计划则比较。