Large neural networks can improve the accuracy and generalization on tasks across many domains. However, this trend cannot continue indefinitely due to limited hardware memory. As a result, researchers have devised a number of memory optimization methods (MOMs) to alleviate the memory bottleneck, such as gradient checkpointing, quantization, and swapping. In this work, we study memory optimization methods and show that, although these strategies indeed lower peak memory usage, they can actually decrease training throughput by up to 9.3x. To provide practical guidelines for practitioners, we propose a simple but effective performance model PAPAYA to quantitatively explain the memory and training time trade-off. PAPAYA can be used to determine when to apply the various memory optimization methods in training different models. We outline the circumstances in which memory optimization techniques are more advantageous based on derived implications from PAPAYA. We assess the accuracy of PAPAYA and the derived implications on a variety of machine models, showing that it achieves over 0.97 R score on predicting the peak memory/throughput, and accurately predicts the effectiveness of MOMs across five evaluated models on vision and NLP tasks.
翻译:大型神经网络的使用可以提高许多领域中任务的准确性和泛化性, 然而这个趋势不可能无限延续, 因为硬件内存有所限制. 为了缓解内存瓶颈, 研究人员提出了许多内存优化方法( MOMs), 如梯度检查点、量化和交换等. 在这项工作中, 我们研究了内存优化方法并表明, 尽管这些策略确实降低了峰值内存使用量, 但它们实际上可能对训练吞吐量造成长达9.3倍的影响. 为了为从业者提供实用指南, 我们提出了一个简单而有效的性能模型 PAPAYA, 以定量解释内存和训练时间之间的权衡. PAPAYA 可用于确定何时在训练不同模型时应用各种内存优化方法. 我们根据 PAPAYA 推导出的推论, 概述了内存优化技术更有优势的情况. 我们评估了 PAPAYA 和导出的总结在各种机器模型上的准确性, 表明它在预测峰值内存/吞吐量方面达到了超过0.97的R得分, 并在视觉和NLP任务中准确预测了MOMs的有效性。