Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations. They do not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive efficient parallel execution plans at each parallelism level. Alpa implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans. Alpa's source code is publicly available at https://github.com/alpa-projects/alpa
翻译:Alpa自动制成的大型深层学习(DL)模式培训模式,通过生成能统一数据、操作者和管道平行执行计划的执行计划,对大型深层学习(DL)模式进行模型和平行培训。现有的模型和平行培训系统要求用户手工创建平行计划,或者通过模型平行配置的有限空间自动生成一个模型。这些系统不足以在分布式计算设备上推广复杂的DL模式。Alpa通过将大型DL模式的培训分为两个等级级:即操作者之间的平行和操作者内部平行。基于它,Alpa为大规模模型平行执行计划建造了新的等级空间。Alpa设计了一些编集通行证,以便在每个平行级别上自动获得有效的平行执行计划。Alpa实施高效的运行时间,以在分布式计算设备上协调双级平行执行。我们的评价显示Alpa生成平行化计划,将平行化计划匹配或超过为它们设计的手工调整模型-平行培训系统。与专门系统不同,Alpa也将模型与不使用不同,在没有手工设计计划/Alpaabas 公开设计计划的情况下,将模型和模型综合。 MApa's源代码是公开提供的。