关于异基因多功能多功能服务器的粗微深级学习的适应性弹性弹性大修培训 (Adaptive Elastic Training for Sparse Deep Learning on Heterogeneous Multi-GPU Servers)

Motivated by extreme multi-label classification applications, we consider training deep learning models over sparse data in multi-GPU servers. The variance in the number of non-zero features across training batches and the intrinsic GPU heterogeneity combine to limit accuracy and increase the time to convergence. We address these challenges with Adaptive SGD, an adaptive elastic model averaging stochastic gradient descent algorithm for heterogeneous multi-GPUs that is characterized by dynamic scheduling, adaptive batch size scaling, and normalized model merging. Instead of statically partitioning batches to GPUs, batches are routed based on the relative processing speed. Batch size scaling assigns larger batches to the faster GPUs and smaller batches to the slower ones, with the goal to arrive at a steady state in which all the GPUs perform the same number of model updates. Normalized model merging computes optimal weights for every GPU based on the assigned batches such that the combined model achieves better accuracy. We show experimentally that Adaptive SGD outperforms four state-of-the-art solutions in time-to-accuracy and is scalable with the number of GPUs.

翻译：在极端多标签分类应用程序的推动下,我们考虑培训深层次学习模型,而不是多GPU服务器中稀少的数据。不同培训批次的非零特征数量的差异和内在的GPU异质性,这限制了准确性,增加了趋同时间。我们用适应性弹性模型SGD来应对这些挑战,SGD是一种适应性弹性模型,它平均是不同多组GPU的随机梯度梯度下行算法,其特点是动态调度、适应性批次缩放和普通化模型合并。不是静态分配批次到GPUS,而是根据相对处理速度选择批次。批次的大小缩放规模将较大批次分配到更快的GPUPS,小批次分配到较慢的组次,目标是达到一个稳定状态,所有GPUPS都进行相同数目的模型更新。正常化模型根据所指定的批次对每个GPUPU的最佳加权比重进行合并,这样联合模型就能更准确。我们实验性地显示,适应性 SGDGD优于时间到轨道上的4个状态和比例。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【斯坦福大学课程】2021年深度多任务学习与元学习，CS 330: Deep Multi-Task and Meta Learning

专知会员服务

110+阅读 · 2022年3月2日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

剑桥大学《数据科学: 原理与实践》课程，附PPT下载

专知会员服务

53+阅读 · 2021年1月20日

【Google】平滑对抗训练，Smooth Adversarial Training

专知会员服务

49+阅读 · 2020年7月4日