深度学习大规模模型的高效训练：文献综述 (On Efficient Training of Large-Scale Deep Learning Models: A Literature Review)

The field of deep learning has witnessed significant progress, particularly in computer vision (CV), natural language processing (NLP), and speech. The use of large-scale models trained on vast amounts of data holds immense promise for practical applications, enhancing industrial productivity and facilitating social development. With the increasing demands on computational capacity, though numerous studies have explored the efficient training, a comprehensive summarization on acceleration techniques of training deep learning models is still much anticipated. In this survey, we present a detailed review for training acceleration. We consider the fundamental update formulation and split its basic components into five main perspectives: (1) data-centric: including dataset regularization, data sampling, and data-centric curriculum learning techniques, which can significantly reduce the computational complexity of the data samples; (2) model-centric, including acceleration of basic modules, compression training, model initialization and model-centric curriculum learning techniques, which focus on accelerating the training via reducing the calculations on parameters; (3) optimization-centric, including the selection of learning rate, the employment of large batchsize, the designs of efficient objectives, and model average techniques, which pay attention to the training policy and improving the generality for the large-scale models; (4) budgeted training, including some distinctive acceleration methods on source-constrained situations; (5) system-centric, including some efficient open-source distributed libraries/systems which provide adequate hardware support for the implementation of acceleration algorithms. By presenting this comprehensive taxonomy, our survey presents a comprehensive review to understand the general mechanisms within each component and their joint interaction.

翻译：深度学习领域取得了重大进展，尤其是在计算机视觉 (Computer Vision, CV)、自然语言处理 (Natural Language Processing, NLP) 和语音等方面。在大量数据上训练大规模模型的使用具有巨大的应用潜力，可以增强工业生产力并促进社会发展。随着计算能力的不断提升，尽管有大量研究探索了高效训练方法，但仍然急需综述整理深度学习模型训练加速技术。在本文中，我们提出了一个详细的高效训练技术综述。我们将基本更新公式划分为五个主要方面，分别是：（1）数据中心：包括数据集正则化、数据采样和数据中心课程学习技术，可以显著减少数据样本的计算复杂度；（2）模型中心，包括基本模块加速、压缩训练、模型初始化和模型中心课程学习技术，重点在于通过减少参数的计算来加速训练；（3）优化中心，包括学习率的选择、大批量训练的使用、设计高效目标和模型平均技巧等，注重训练策略和改进大规模模型的普适性；（4）预算训练，包括源受限情况下的一些独特加速方法；（5）系统中心，包括一些高效的开源分布式库/系统，提供足够的硬件支持实现加速算法。我们的综述通过提供全面的分类机制，从每个组件的基本构成部分切入，深度分析每个方面，并阐述它们之间的互动关系。