With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) to utilize large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations. This synchronization is the central algorithmic bottleneck. To combat this, we introduce the Distributed Asynchronous and Selective Optimization (DASO) method which leverages multi-GPU compute node architectures to accelerate network training. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to other existing data parallel training methods.
翻译:随着数据和模型复杂性的不断增加,培训神经网络所需的时间已经变得令人望而却步。为了解决培训时间的指数上升问题,用户正在转向数据平行神经网络(DPNN),以利用计算机集群上的大规模分布资源。当前的DPNN 方法通过同步和平均平均梯度,在所有进程中执行网络参数更新,并阻断通信操作。这种同步是核心算法瓶颈。为了解决这一问题,我们采用了分布式同步和选择性优化(DASO)方法,利用多-GPU计算节点结构加速网络培训。DASO使用由节点本地和全球网络组成的等级和无同步通信计划,同时在学习过程中调整全球同步率。我们显示,DASO与其他现有数据平行培训方法相比,将传统和最新网络的培训时间减少到34%。