Pipeline parallelism has been demonstrated to be a remarkable approach to improve throughput for training deep neural networks with billions of parameters over heterogeneous clusters. The 1F1B scheduling plan is a widely adopted strategy for memory and performance optimization, which interchanges the forward and backward stage computations of different micro-batches. On the other hand, a common issue in using the 1F1B scheduling is that stage computation is delayed due to the data transfer when network resources are preempted by other tasks, even with the minimum communication between stages. The exclusive access of these network resources cannot be guaranteed in cloud offerings. We present a general scheduling technique to accommodate pipeline parallelism to preempted network environments at the expense of a certain amount of memory pressure. The core concept is to extend 1F1B schedule scheme to kFkB, which groups k micro-batches, and alternately executes k forward and backward computations. We propose Ada-Grouper, an adaptive kFkB scheduler which regularly adjusts the number of group members k to maintain an optimal balance between communication and computation efficiency correspond to changes in a changing network environment under the memory limit. Experimental results demonstrate that our design maintain stable performance for pipeline parallelism, yielding a performance increase of up from 4% to 30%, compared with 1F1B in preempted network scenarios.
翻译:管道平行性已被证明是改进深神经网络培训输送量的出色方法,其参数有数十亿个不同组群的参数。 1F1B 排期计划是一个广泛采用的记忆和性能优化战略,它将不同微囊的前阶段和后阶段计算方法互换。另一方面,使用 1F1B 排程的一个共同问题是,由于数据传输,在网络资源被其他任务(即使是两个阶段之间的最小通信)先发制人时,由于数据传输而推迟了阶段计算。这些网络资源的排他性访问无法在云端提供中保证。我们提出了一个总的排程技术,以牺牲一定的内存压力为代价,将管道平行性以预设网络环境。核心概念是将1F1B 排程计划扩展至 kFkB, 将千个微节奏分组, 并交替执行 k 前向和后向计算。我们建议一个适应性的 kFkB 调度器, 定期调整集团成员的数量,以保持通信和计算效率之间的最佳平衡, 以适应在存储网络环境中的变化, 以牺牲一定的存储压力压力压力压力压力压力。 实验性性结果显示我们从记忆前 30B 的平行性业绩设计, 将比前 递增后 。</s>