The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly supported through program primitives, but identifying efficient partitioning strategies requires expensive experimentation and expertise. We present the prototype of an automated partitioner that seamlessly integrates into existing compilers and existing user workflows. Our partitioner enables SPMD-style parallelism that encompasses data parallelism and parameter/activation sharding. Through a combination of inductive tactics and search in a platform-independent partitioning IR, automap can recover expert partitioning strategies such as Megatron sharding for transformer layers.
翻译:对培训大型神经网络结构的需求迅速增加,这突出了对分割战略的需要,例如使用数据、模型或管道平行法。实施这些方法越来越多地通过程序原始法得到支持,但确定高效分割战略需要昂贵的实验和专门知识。我们展示了自动分割器的原型,该原型无缝地融入现有的汇编者和现有用户工作流程。我们的分割器使SPMD型平行法能够包括数据平行法和参数/活动碎片。通过结合采用传导战术和在平台独立的分割法中搜索,自动测绘可以恢复专家分割战略,例如变压层的威震天碎片。