The growing energy and performance costs of deep learning have driven the community to reduce the size of neural networks by selectively pruning components. Similarly to their biological counterparts, sparse networks generalize just as well, if not better than, the original dense networks. Sparsity can reduce the memory footprint of regular networks to fit mobile devices, as well as shorten training time for ever growing networks. In this paper, we survey prior work on sparsity in deep learning and provide an extensive tutorial of sparsification for both inference and training. We describe approaches to remove and add elements of neural networks, different training strategies to achieve model sparsity, and mechanisms to exploit sparsity in practice. Our work distills ideas from more than 300 research papers and provides guidance to practitioners who wish to utilize sparsity today, as well as to researchers whose goal is to push the frontier forward. We include the necessary background on mathematical methods in sparsification, describe phenomena such as early structure adaptation, the intricate relations between sparsity and the training process, and show techniques for achieving acceleration on real hardware. We also define a metric of pruned parameter efficiency that could serve as a baseline for comparison of different sparse networks. We close by speculating on how sparsity can improve future workloads and outline major open problems in the field.
翻译:深层学习的能量和性能成本不断增长,促使社区通过选择性修剪部件缩小神经网络的规模。与生物对口一样,稀少的网络也与原始密集的网络相似,即使不比原始密集的网络更好,也非常普遍。分化可以减少常规网络的记忆足迹,以安装移动设备,缩短不断增长的网络的培训时间。在本文中,我们调查了深层学习中的弥漫性工作,为推断和培训提供了广泛的简单化指导。我们描述了消除和增加神经网络元素的方法、实现模型宽度的不同培训战略以及在实践中利用广度的机制。我们的工作从300多份研究论文中提取了想法,并为希望利用现今世宽度的从业者以及旨在推进前沿网络的研究人员提供了指导。我们把数学方法的必要背景包括了深入学习,描述了早期结构适应、紧张性与培训进程之间的复杂关系,并展示了实现实际硬件加速的技术。我们还确定了一个测量深度参数效率的尺度,用以测量未来主要网络的深度问题,我们可以通过不同基线来改进实地测算。