Communication efficiency plays an important role in accelerating the distributed training of Deep Neural Networks (DNN). All-reduce is the crucial communication primitive to reduce model parameters in distributed DNN training. Most existing all-reduce algorithms are designed for traditional electrical interconnect systems, which cannot meet the communication requirements for distributed training of large DNNs due to the low data bandwidth of the electrical interconnect systems. One of the promising alternatives for electrical interconnect is optical interconnect, which can provide high bandwidth, low transmission delay, and low power cost. We propose an efficient scheme called WRHT (Wavelength Reused Hierarchical Tree) for implementing all-reduce operation in optical interconnect systems. WRHT can take advantage of WDM (Wavelength Division Multiplexing) to reduce the communication time of distributed data-parallel DNN training. We further derive the required number of wavelengths, the minimum number of communication steps, and the communication time for the all-reduce operation on optical interconnect. The constraint of insertion loss is also considered in our analysis. Simulation results show that the communication time of all-reduce by WRHT is reduced by 80.81%, 64.36%, and 82.12%, respectively, compared with three traditional all-reduce algorithms according to our simulation results of an optical interconnect system. Our results also show that WRHT can reduce the communication time of all-reduce operation by 92.42% and 91.31% compared to two existing all-reduce algorithms running in the electrical interconnect system.
翻译:通信效率在加快深神经网络(DNN)的分布式培训方面起着重要作用。 全部减少是减少分布式DNN培训中模型参数的关键通信原始方法。 大部分现有的全部减少算法是为传统电路连接系统设计的,由于电路连接系统的数据宽度较低,无法满足大型DNN培训的传播要求。 连接电路的一个有希望的替代办法是光学连接,这可以提供高带宽、低传输延迟和低电费。 我们提出在分布式DNN培训中实施全面减少电路连接操作的有效办法。 WRHT(WRHT)可以利用传统电路连接系统(WDM(Wastle Division plexxing))的优势,以减少分布式数据连接系统分布式培训的通信时间。 我们进一步得出所需的波长数、最小通信步骤和全速减少光电路连接操作的通信时间。 我们还在分析中考虑了插入损失的制约因素。 模拟结果显示,通过传统光电路连接系统(WRHRHT) 将全部通信时间减少两次递减为64. 的通信时间, 和整个内部连通速度缩小了81%, 缩小了我们所有通信连接为82%。