Overparameterized Neural Networks (NN) display state-of-the-art performance. However, there is a growing need for smaller, energy-efficient, neural networks tobe able to use machine learning applications on devices with limited computational resources. A popular approach consists of using pruning techniques. While these techniques have traditionally focused on pruning pre-trained NN (LeCun et al.,1990; Hassibi et al., 1993), recent work by Lee et al. (2018) has shown promising results when pruning at initialization. However, for Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned. In this paper, we provide a comprehensive theoretical analysis of Magnitude and Gradient based pruning at initialization and training of sparse architectures. This allows us to propose novel principled approaches which we validate experimentally on a variety of NN architectures.
翻译:超临界神经网络(NN)展示了最先进的性能。然而,越来越需要更小的、节能的、神经网络,以便能够在计算资源有限的装置上使用机器学习应用程序。一种流行的方法包括使用修剪技术。虽然这些技术传统上侧重于修剪经过训练的NN(LeCun等人,1990年;Hassibi等人,1993年),但Lee等人(2018年)最近的工作在初始化时显示出了有希望的结果。然而,对于深海NNPs来说,这种程序仍然不能令人满意,因为由此形成的修剪网络可能难以培训,例如,它们不能防止一个层完全修剪整。在本文中,我们提供了基于初始化和稀有建筑培训修剪裁的磁度和梯度的综合理论分析。这使我们能够提出新的原则方法,我们在各种NNP结构上实验性地验证这些方法。