Hydra:一个大型多模式深层学习系统 (Hydra: A System for Large Multi-Model Deep Learning)

In many deep learning (DL) applications, the desire for ever higher accuracy and the new ubiquity of transfer learning has led to a marked increase in the size and depth of model architectures. Thus, the memory capacity of GPUs is often a bottleneck for DL practitioners. Existing techniques that rely on partitioning the model architecture across a network of GPUs suffer from substantial underutilization and busy waiting due to sequential dependencies in most large-scale model architectures (Transformers, CNNs). We observe that almost all such prior large-model systems focus on training only one model at a time, but in reality DL practitioners often train many models in bulk due to model selection needs, e.g., hyper-parameter tuning, architecture finetuning, etc. This gap leads to significant system inefficiency. We approach this problem from first principles and propose a new information system architecture for scalable multi-model training that adapts and blends ideas from classical RDBMS design with task parallelism from the ML world. We propose a suite of techniques to optimize system efficiency holistically, including a highly general parameter-spilling design that enables large models to be trained even with a single GPU, a novel multi-query optimization scheme that blends model execution schedules efficiently and maximizes GPU utilization, and a double buffering idea to hide latency. We prototype our ideas on top of PyTorch to build a system we call Hydra. Experiments with real benchmark large-scale multi-model DL workloads show that Hydra is over 7x faster than regular model parallelism and 1.8-4.5X faster than state-of-the-art industrial tools for large-scale model training.

翻译：在许多深层次的学习(DL)应用中,由于对更高准确度的渴望以及转移学习的全新常态性,几乎所有以前大型系统都侧重于一次只培训一个模型,但在现实中,DL从业者往往由于模型选择需要而大量培训许多模型,例如超参数调整、结构微调等,因此GPU的记忆能力往往是DL从业者面临的瓶颈。依靠将模型结构隔开到GPU网络的现有技术,由于大多数大型模型结构(Transtrades,CNNs)的连续依赖性,使用率严重不足和繁忙等待。我们注意到,几乎所有以前这类大型模型系统都侧重于一次只培训一个模型,但在现实中,DL从业者往往大量培训许多模型,因为模型选择需要,例如超参数调整、结构微调等等,因此,GPUPU的常规能力往往是一个瓶颈。我们从最初的原则出发,提出一个新的信息系统结构架构,将我们传统的RDBMS设计中的想法与ML世界的任务平行结合起来。我们建议一套技术,在整体上优化系统效率,包括一个高度通用的DLILS-S-Sloial-lial-lial-hal-hal-lial-h-lish-lish-h-h-hing-modal-modal-mod-modal-modal-modal-mod-modal-modaldaldaldald-modal-modal-modalking-mod-mod-modal-modaldaldaldaldal-modal-s

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日