利用分布式非同步和选择性优化加速神经网络培训(DASO) (Accelerating Neural Network Training with Distributed Asynchronous and Selective Optimization (DASO))

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) to utilize large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations. This synchronization is the central algorithmic bottleneck. To combat this, we introduce the Distributed Asynchronous and Selective Optimization (DASO) method which leverages multi-GPU compute node architectures to accelerate network training. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to other existing data parallel training methods.

翻译：随着数据和模型复杂性的不断增加,培训神经网络所需的时间已经变得令人望而却步。为了解决培训时间的指数上升问题,用户正在转向数据平行神经网络(DPNN),以利用计算机集群上的大规模分布资源。当前的DPNN 方法通过同步和平均平均梯度,在所有进程中执行网络参数更新,并阻断通信操作。这种同步是核心算法瓶颈。为了解决这一问题,我们采用了分布式同步和选择性优化(DASO)方法,利用多-GPU计算节点结构加速网络培训。DASO使用由节点本地和全球网络组成的等级和无同步通信计划,同时在学习过程中调整全球同步率。我们显示,DASO与其他现有数据平行培训方法相比,将传统和最新网络的培训时间减少到34%。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

最新《序列预测问题导论》教程，212页ppt

专知会员服务

86+阅读 · 2020年8月22日

【CMU】图卷积神经网络中的池化综述，Pooling in Graph Convolutional Neural Network

专知会员服务

46+阅读 · 2020年4月8日

【阿里巴巴达摩院】TResNet: 高性能的GPU专用架构，GPU-Dedicated Architecture

专知会员服务

33+阅读 · 2020年4月1日