Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations, which does not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive the optimal parallel execution plan in each independent parallelism level and implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.
翻译:Alpa自动制成的大型深层学习模型(DL)培训模式,通过生成能统一数据、操作员和管道平行执行计划的执行计划,对大型深层学习模型(DL)进行模型和平行培训。现有的模型平行培训系统要求用户手工创建平行计划,或者通过模型平行配置的有限空间自动生成一个模型,这并不足以在分布式计算设备上推广复杂的DL模型。Alpa通过将大型DL模型的培训分为两个等级层次来进行:操作员之间的平行和操作员内部平行。基于它,Alpa为大型模型平行执行计划建造了新的等级空间。Alpa设计了一些编集通行证,以自动获得每个独立平行执行模型的最佳执行空间,并执行一项高效的运行时间,以在分布式计算设备上协调双级平行执行。我们的评估显示Alpa生成平行化计划,将平行化计划匹配或超过为它们设计的模型的手工调整模型培训系统。与专门系统不同,Alpa还把模型与不需手工设计的模型和模型的模型的模型进行概括。