走向更有效、经济分散的启动模式 (Towards More Effective and Economic Sparsely-Activated Model)

Hao Jiang,Ke Zhan,Jianwei Qu,Yongkang Wu,Zhaoye Fei,Xinyu Zhang,Lei Chen,Zhicheng Dou,Xipeng Qiu,Zikai Guo,Ruofei Lai,Jiawen Wu,Enrui Hu,Yinxia Zhang,Yantao Jia,Fan Yu,Zhao Cao

The sparsely-activated models have achieved great success in natural language processing through large-scale parameters and relatively low computational cost, and gradually become a feasible technique for training and implementing extremely large models. Due to the limit of communication cost, activating multiple experts is hardly affordable during training and inference. Therefore, previous work usually activate just one expert at a time to alleviate additional communication cost. Such routing mechanism limits the upper bound of model performance. In this paper, we first investigate a phenomenon that increasing the number of activated experts can boost the model performance with higher sparse ratio. To increase the number of activated experts without an increase in computational cost, we propose SAM (Switch and Mixture) routing, an efficient hierarchical routing mechanism that activates multiple experts in a same device (GPU). Our methods shed light on the training of extremely large sparse models and experiments prove that our models can achieve significant performance gain with great efficiency improvement.

翻译：由于通信成本有限,在培训和推论期间,很难负担得起多专家的启动费用。因此,以往的工作通常每次只激活一名专家,以减轻额外的通信费用。这种路由机制限制了模型性能的上限。在本文件中,我们首先调查了一种现象,即增加活跃专家的人数可以提高模型性能,而少得可怜。为了增加活跃专家的人数,而不增加计算成本,我们提议采用SAM(变换和混合)路由,即高效的分层路由机制,即在同一设备中激活多位专家(GPU),我们的方法展示了对极为稀少模型的培训以及实验,证明我们的模式可以大大提高效率,从而取得显著的业绩收益。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

NeurIPS 2020接收论文列表发布，1900篇论文都在这了！

专知会员服务

114+阅读 · 2020年10月8日

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

PyTorch深度学习零基础入门《First steps towards Deep Learning with pyTorch》

专知会员服务

120+阅读 · 2019年10月28日