深层数据效率:通过高效的数据抽样和运行,提高深层学习示范质量和培训效率 (DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing)

Recent advances on deep learning models come at the price of formidable training cost. The increasing model size is one of the root cause, but another less-emphasized fact is that data scale is actually increasing at a similar speed as model scale, and the training cost is proportional to both of them. Compared to the rapidly evolving model architecture, how to efficiently use the training data (especially for the expensive foundation model pertaining) is both less explored and difficult to realize due to the lack of a convenient framework that focus on data efficiency capabilities. To this end, we present DeepSpeed Data Efficiency library, a framework that makes better use of data, increases training efficiency, and improves model quality. Specifically, it provides efficient data sampling via curriculum learning, and efficient data routing via random layerwise token dropping. DeepSpeed Data Efficiency takes extensibility, flexibility and composability into consideration, so that users can easily utilize the framework to compose multiple techniques and apply customized strategies. By applying our solution to GPT-3 1.3B and BERT-Large language model pretraining, we can achieve similar model quality with up to 2x less data and 2x less time, or achieve better model quality under similar amount of data and time.

翻译：最近深层次学习模型的进展是以惊人的培训成本为代价的。不断增大的模型规模是一个根本原因,但另一个不那么强调的事实是,数据规模实际上以与模型规模相似的速度增长,而培训成本与两者成正比。与迅速变化的模型结构相比,如何高效使用培训数据(特别是昂贵的基础模型),由于缺少一个以数据效率能力为重点的方便框架,因此探索较少,也难以实现。为此,我们介绍了深层数据效率库,这是一个更好地利用数据、提高培训效率和提高模型质量的框架。具体地说,它通过课程学习提供高效的数据抽样,通过随机的层状图示下降提供高效的数据路由。深层数据效率考虑到可扩展性、灵活性和可兼容性,以便用户能够很容易地利用这一框架来构思多种技术和应用定制的战略。通过对GPT-3 1.3B和BERT-Large语言模型的培训前应用我们的解决方案,我们可以实现类似的模型质量,通过2x更少的数据和2x时间,或者在类似的数据数量下实现更好的模型质量。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/