Communication efficiency plays an important role in accelerating the distributed training of Deep Neural Networks (DNN). All-reduce is the key communication primitive to reduce model parameters in distributed DNN training. Most existing all-reduce algorithms are designed for traditional electrical interconnect systems, which cannot meet the communication requirements for distributed training of large DNNs. One of the promising alternatives for electrical interconnect is optical interconnect, which can provide high bandwidth, low transmission delay, and low power cost. We propose an efficient scheme called WRHT (Wavelength Reused Hierarchical Tree) for implementing all-reduce operation in optical interconnect system, which can take advantage of WDM (Wavelength Division Multiplexing) to reduce the communication time of distributed data-parallel DNN training. We further derive the minimum number of communication steps and communication time to realize the all-reduce using WRHT. Simulation results show that the communication time of WRHT is reduced by 75.59%, 49.25%, and 70.1% respectively compared with three traditional all-reduce algorithms simulated in optical interconnect system. Simulation results also show that WRHT can reduce the communication time for all-reduce operation by 86.69% and 84.71% in comparison with two existing all-reduce algorithms in electrical interconnect system.
翻译:在加速深神经网络的分布式培训方面,通信效率起着重要作用。 全部减少是减少分布式DNN培训中模型参数的关键通信原始方法。 大部分现有的全部减少算法是为传统的电子互连系统设计的,这些系统无法满足对大型DNN培训的分布式培训的通信要求。 电气互连的最有希望的替代办法之一是光学互连,这可以提供高带宽、低传输延迟和低电费。 我们提议一个高效的系统,称为WRHT(WRHT)(WWVLY重新使用的高层次树),用于在光学互连系统中实施所有降式操作,这可以利用WDM(WDDD(WVLDDD多重转换)),以减少分布式数据-平行 DNNN培训的通信时间。 我们进一步得出利用WRHT实现全速减少的通信步骤和通信时间的最低数量。 模拟结果表明,WRHT的通信时间分别减少759%、49.25%和70.1%,而三个在光学互连通系统中模拟的传统的全局全局性算法。 模拟了所有WRHWRHT的计算结果还显示,将减少所有通信的连接率。