Modern Deep Neural Networks (DNNs) require significant memory to store weight, activations, and other intermediate tensors during training. Hence, many models do not fit one GPU device or can be trained using only a small per-GPU batch size. This survey provides a systematic overview of the approaches that enable more efficient DNNs training. We analyze techniques that save memory and make good use of computation and communication resources on architectures with a single or several GPUs. We summarize the main categories of strategies and compare strategies within and across categories. Along with approaches proposed in the literature, we discuss available implementations.
翻译:现代深神经网络(DNN)需要大量的内存来储存重量、激活和其他中间的发压器,因此,许多模型不适合一个 GPU 设备,或只能使用一个小的每个 GPU 批量尺寸来进行培训。这项调查系统地概述了能够进行更有效的 DNN 培训的方法。我们分析了保存记忆和很好地利用计算和通信资源的技术,以及一个或几个 GPU 的建筑。我们总结了主要的战略类别,比较了各类内部和跨类的战略。除了文献中建议的方法外,我们还讨论了现有的实施方法。