The rapidly enlarging neural network models are becoming increasingly challenging to run on a single device. Hence model parallelism over multiple devices is critical to guarantee the efficiency of training large models. Recent proposals fall short either in long processing time or poor performance. Therefore, we propose Celeritas, a fast framework for optimizing device placement for large models. Celeritas employs a simple but efficient model parallelization strategy in the Standard Evaluation, and generates placement policies through a series of scheduling algorithms. We conduct experiments to deploy and evaluate Celeritas on numerous large models. The results show that Celeritas not only reduces the placement policy generation time by 26.4\% but also improves the model running time by 34.2\% compared to most advanced methods.
翻译:快速扩大的神经网络模型越来越难以在单一设备上运行。 因此,多个设备的模型平行性对于保证培训大型模型的效率至关重要。 最近的建议要么在处理时间过长,要么性能差。 因此,我们提议Celeritas作为优化大型模型设备放置的快速框架。 Celeritas在标准评估中采用简单而高效的模型平行战略,并通过一系列列表算法生成定位政策。 我们进行实验,在众多大型模型上部署并评估Celeritas。 结果表明,Celeritas不仅将定位政策生成时间减少26.4 ⁇ ⁇,而且将模型运行时间比最先进的方法提高了34.2 ⁇ 。