Deep learning have achieved promising results on a wide spectrum of AI applications. Larger datasets and models consistently yield better performance. However, we generally spend longer training time on more computation and communication. In this survey, we aim to provide a clear sketch about the optimizations for large-scale deep learning with regard to the model accuracy and model efficiency. We investigate algorithms that are most commonly used for optimizing, elaborate the debatable topic of generalization gap arises in large-batch training, and review the SOTA strategies in addressing the communication overhead and reducing the memory footprints.
翻译:在广泛的AI应用方面,深入的学习取得了可喜的成果。更大的数据集和模型不断产生更好的效果。然而,我们通常花更长的培训时间进行更多的计算和交流。在本次调查中,我们的目标是就模型准确性和模型效率方面的大规模深层次学习的优化提供清晰的草图。我们调查最常用的优化算法,阐述大批培训中出现的可争辩的普遍化差距问题,并审查SOTA在解决通信间接费用和减少记忆足迹方面的战略。