MAP: 基金会模型记忆意识自动内部平行培训 (MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models)

Recently, large models have achieved the state of the art performances in various fields. In order to support large model training, we have to use distributed training techniques. However, finding an efficient distributed execution plan not only requires fine-grained model statistics, such as memory and computing overhead of each operator but also is a labor-intensive task even for an expert in the field of distributed training. In this paper, we introduce MAP, a compiler built upon PyTorch to implement Memory-aware Automated Parallelization. To profiling operator costs, existing training systems and machine learning pipelines either physically execute with respect to each operand or estimate the memory usage with a scaled input tensor, which are often time-consuming and misleading. Compared with existing methods, MAP provides an easy-to-use symbolic profiler to generate memory and computing statistics of an arbitrary PyTorch model with trivial time cost, so it will boost high productivity for ML developers. In addition, MAP can also seamlessly speed up different static planning tasks on computation graphs for PyTorch, and requires only a few lines of modification to user code to generate a new module instance that has a top-performing distributed execution plan. The source code is publicly available at https://github.com/hpcaitech/ColossalAI

翻译：最近,大型模型在各个领域达到了艺术表演的状态。为了支持大型模型培训,我们必须使用分布式培训技术。然而,找到高效分布式执行计划不仅需要精细的模型统计,例如每个操作者的记忆和计算间接费用,而且即使是分布式培训领域的专家也是一项劳动密集型任务。在本文中,我们引入了在PyTorrch上建起的一个汇编器MAP, 以实施记忆觉醒自动平行化。对于剖析操作员费用,现有的培训系统和机器学习管道,或者对每个操作器实际执行,或者用一个规模化的投入高压来估计记忆用量,这些投入往往耗时和误导。与现有方法相比,MAP提供了一个易于使用的象征性描述器,以生成记忆和计算任意的PyTorrch模型的统计,而其时间成本微不足道,因此,这将提高MLL开发商的生产率。此外,MAP还可以完美地加快PyTorrch计算图上的不同静态规划任务,并且只需要对用户代码进行几行修改,以生成新的模块代码,而这往往耗时费时间和误导。 MACAFSO/CSODSDLSODSODLSUDSO

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

75+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日