Alpa: 将分布式深层学习的操作间和内部平行主义自动化 (Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning)

Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations, which does not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive the optimal parallel execution plan in each independent parallelism level and implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.

翻译：Alpa自动制成的大型深层学习模型(DL)培训模式,通过生成能统一数据、操作员和管道平行执行计划的执行计划,对大型深层学习模型(DL)进行模型和平行培训。现有的模型平行培训系统要求用户手工创建平行计划,或者通过模型平行配置的有限空间自动生成一个模型,这并不足以在分布式计算设备上推广复杂的DL模型。Alpa通过将大型DL模型的培训分为两个等级层次来进行:操作员之间的平行和操作员内部平行。基于它,Alpa为大型模型平行执行计划建造了新的等级空间。Alpa设计了一些编集通行证,以自动获得每个独立平行执行模型的最佳执行空间,并执行一项高效的运行时间,以在分布式计算设备上协调双级平行执行。我们的评估显示Alpa生成平行化计划,将平行化计划匹配或超过为它们设计的模型的手工调整模型培训系统。与专门系统不同,Alpa还把模型与不需手工设计的模型和模型的模型的模型进行概括。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日