Hydra:一个大型多模式深层学习系统 (Hydra: A System for Large Multi-Model Deep Learning)

Scaling up model depth and size is now a common approach to raise accuracy in many deep learning (DL) applications, as evidenced by the widespread success of multi-billion or even trillion parameter models in natural language processing (NLP) research. Despite success in DL research and at major technology companies, broader practical adoption of such large models among domain scientists and businesses is still bottlenecked by GPU memory limits, high training costs, and low GPU availability, even on public clouds. Model selection needs further compound these resource challenges: users often need to compare dozens of models with different hyper-parameters or neural architectures to suit their specific task and dataset. In this paper, we present Hydra, a system designed to tackle such challenges by enabling out-of-the-box scaling for multi-large-model DL workloads on even commodity GPUs in a resource-efficient manner. Hydra is the first approach to holistically optimize the execution of multi-model workloads for large DL models. We do this by adapting prior "model-parallel" execution schemes to work with scalable parameter offloading across the memory hierarchy and further hybridizing this approach with task-parallel job scheduling techniques. Hydra decouples scalability of model parameters from parallelism of execution, thus enabling DL users to train even a 6-billion parameter model on a single commodity GPU. It also fully exploits the speedup potential of task parallelism in multi-GPU setups, yielding near-linear strong scaling and making rigorous model selection perhaps more practical for such models. We evaluate end-to-end performance by fine-tuning GPT-2 for language modeling. We find that Hydra offers between 50% and 100% higher training throughput than even the best settings of state-of-the-art industrial frameworks such as DeepSpeed and GPipe for multi-large-model training.

翻译：扩大模型深度和尺寸现在已成为提高许多深层次学习(DL)应用应用的准确性的共同方法,这体现在自然语言处理(NLP)研究中数十亿甚至万亿个参数模型的广泛成功。尽管在DL研究中和主要技术公司中取得了成功,但在域科学家和企业中广泛实际采用这类大型模型仍然受到资源效率高的制约,甚至公共云层的GPU内存限制、高培训成本和低GPU的可用性仍然受到制约。模型选择需要进一步增加这些资源挑战:用户往往需要将数十个模型与不同的超常参数或神经结构进行比较,以适应他们的具体任务和数据集。在本文件中,我们介绍一个旨在应对此类挑战的系统,即通过多模型存储、高超超常参数化模型或神经元结构,在更接近的记忆-更深层任务级结构中,使多模式的GL工作量超升标准,从而能够应对此类挑战。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/