Model parallelism has become necessary to train large neural networks. However, finding a suitable model parallel schedule for an arbitrary neural network is a non-trivial task due to the exploding search space. In this work, we present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules. Leveraging the key insight that a neural network can be represented as a directed acyclic graph, within which may only exist a limited set of frequent subgraphs, we design a graph pruning algorithm to fold the search space efficiently. TAP runs at sub-linear complexity concerning the neural network size. Experiments show that TAP is $20\times- 160\times$ faster than the state-of-the-art automatic parallelism framework, and the performance of its discovered schedules is competitive with the expert-engineered ones.
翻译:模型平行性对于培训大型神经网络是必要的。 但是,由于搜索空间爆炸,为任意神经网络找到一个合适的模型平行时间表是一项非三重任务。 在这项工作中,我们提出了一个模型平行框架 TAP, 自动搜索最佳数据和高频平行时间表。 利用神经网络可以作为定向单行图( 其中可能只有有限的一组频繁子图)代表的关键洞察力, 我们设计了一个图表缩略算法, 以有效地折叠搜索空间。 TAP 运行在神经网络大小的亚线复杂度上。 实验显示TAP 比最先进的自动平行框架快20美元, 时间- 160美元, 其发现的时间表的性能与专家设计的时间表相比具有竞争力 。