同步分散分散分布的音响模型培训 (Asynchronous Decentralized Distributed Training of Acoustic Models)

Large-scale distributed training of deep acoustic models plays an important role in today's high-performance automatic speech recognition (ASR). In this paper we investigate a variety of asynchronous decentralized distributed training strategies based on data parallel stochastic gradient descent (SGD) to show their superior performance over the commonly-used synchronous distributed training via allreduce, especially when dealing with large batch sizes. Specifically, we study three variants of asynchronous decentralized parallel SGD (ADPSGD), namely, fixed and randomized communication patterns on a ring as well as a delay-by-one scheme. We introduce a mathematical model of ADPSGD, give its theoretical convergence rate, and compare the empirical convergence behavior and straggler resilience properties of the three variants. Experiments are carried out on an IBM supercomputer for training deep long short-term memory (LSTM) acoustic models on the 2000-hour Switchboard dataset. Recognition and speedup performance of the proposed strategies are evaluated under various training configurations. We show that ADPSGD with fixed and randomized communication patterns cope well with slow learners. When learners are equally fast, ADPSGD with the delay-by-one strategy has the fastest convergence with large batches. In particular, using the delay-by-one strategy, we can train the acoustic model in less than 2 hours using 128 V100 GPUs with competitive word error rates.

翻译：对深声模型的大规模分布式培训在当今高性能自动语音识别(ASR)中发挥了重要作用。在本文件中,我们根据平行随机梯度梯度下降的数据,调查各种分散分散式培训战略,以显示其优于通过冲积(尤其是处理大批量)进行常用同步分布式培训的优异性,特别是在处理大批量时。我们研究了三种非同步分散式平行SGD(ADPSGD)的变异性,即固定和随机化的环形自动语音识别模式以及一个延迟计划。我们采用了ADPSGD的数学模型,提供其理论趋同率,并比较三个变异体的经验趋同行为和累赘弹性弹性性。实验是在IBM超级计算机上进行的,用于培训深度短时间存储(LSTM)的声学模型。在各种培训配置下,对拟议战略的承认和加速性表现进行了评估。我们发现,ADPSGDD和随机化通信模式与缓慢的学习者相匹配。当学习者使用最迅速的GPSDGD战略时,使用最慢的压速度速度可以与最慢的压。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【陈天奇】TVM：端到端自动深度学习编译器，244页ppt

专知会员服务

87+阅读 · 2020年5月11日

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【ICML2020提交论文】Learning@home:众包与分散Mixture-of-Experts训练的神经网络（Learning@home: Crowdsourced Training of Large Neural Networks with Decentralized Mixture-of-Experts）

专知会员服务

10+阅读 · 2020年2月12日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日