Although deep learning has made great progress in recent years, the exploding economic and environmental costs of training neural networks are becoming unsustainable. To address this problem, there has been a great deal of research on *algorithmically-efficient deep learning*, which seeks to reduce training costs not at the hardware or implementation level, but through changes in the semantics of the training program. In this paper, we present a structured and comprehensive overview of the research in this field. First, we formalize the *algorithmic speedup* problem, then we use fundamental building blocks of algorithmically efficient training to develop a taxonomy. Our taxonomy highlights commonalities of seemingly disparate methods and reveals current research gaps. Next, we present evaluation best practices to enable comprehensive, fair, and reliable comparisons of speedup techniques. To further aid research and applications, we discuss common bottlenecks in the training pipeline (illustrated via experiments) and offer taxonomic mitigation strategies for them. Finally, we highlight some unsolved research challenges and present promising future directions.
翻译:虽然深度学习在最近几年取得了巨大的进展,但是训练神经网络的经济和环境成本变得不可持续。为了解决这个问题,已经有大量关于“算法高效深度学习”的研究,它寻求通过训练程序语义的改变,而不是通过硬件或实现级别的改变来降低训练成本。在本文中,我们提供了这个领域研究的结构化和全面的概述。首先,我们规范化了算法加速问题,然后利用算法高效训练的基本构建块来开发分类法。我们的分类法突出了看似不同的方法的共性,并揭示了当前研究的空白。接下来,我们提供了评估最佳实践,以实现全面、公正、可靠的速度提升技术比较。为了进一步帮助研究和应用,我们讨论了训练流水线中的常见瓶颈(通过实验说明),并为其中的问题提供分类缓解策略。最后,我们突出了一些未解决的研究挑战,并提出了有前途的未来方向。