分布式培训系统基于统计的高效逐步压缩技术 (An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems)

The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the communication stage of distributed training. Nevertheless, compression comes at the cost of reduced model quality and extra computation overhead. In this work, we design an efficient compressor with minimal overhead. Noting the sparsity of the gradients, we propose to model the gradients as random variables distributed according to some sparsity-inducing distributions (SIDs). We empirically validate our assumption by studying the statistical characteristics of the evolution of gradient vectors over the training process. We then propose Sparsity-Inducing Distribution-based Compression (SIDCo), a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC) while being faster by imposing lower compression overhead. Our extensive evaluation of popular machine learning benchmarks involving both recurrent neural network (RNN) and convolution neural network (CNN) models shows that SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.

翻译：最近深心神经网络规模的大幅增长导致高效分布式培训具有挑战性。许多建议利用梯度的压缩,并提议损失压缩技术,以加快分布式培训的通信阶段。然而,压缩是以模型质量降低和额外计算间接费用为代价的。在这项工作中,我们设计了一个高效压缩器,其管理费用最低;注意到梯度的广度,我们提议将梯度作为随机变量进行模型,根据某些微量诱导射分布(SIDs)进行分配。我们通过研究梯度矢量在培训过程中的演变的统计特征,实证了我们的假设。我们随后提出了基于分布制分布制的简化(SIDCo)(SIDCo)(SIDCo)(SIDCo)(Sparity-Induction-Induction-基于分配制压缩的缩压(SIDCo))(SIDCo)(SDGC),其阈值与深度缩压压压压(DGC(DGC)相似,同时速度更快。我们广泛评价了涉及经常性神经网络(RNN)和进神经网络(CNN)模式的流行机器学习基准,显示SIDCoDC(S-com-trade)将培训速度分别加速到41:7%、7/6、7/6、7%、7%、7%和顶压(T)和1.9%。

相关内容

Neural Networks

关注 1648

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

【综述论文】A Survey on Dynamic Network Embedding，动态网络嵌入综述论文

专知会员服务

101+阅读 · 2020年6月16日

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【NLP模型压缩方法综述】《A Survey of Methods for Model Compression in NLP》by Madison May

专知会员服务

43+阅读 · 2020年4月22日