Regularization has long been utilized to learn sparsity in deep neural network pruning. However, its role is mainly explored in the small penalty strength regime. In this work, we extend its application to a new scenario where the regularization grows large gradually to tackle two central problems of pruning: pruning schedule and weight importance scoring. (1) The former topic is newly brought up in this work, which we find critical to the pruning performance while receives little research attention. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains compared with its one-shot counterpart, even when the same weights are removed. (2) The growing penalty scheme also brings us an approach to exploit the Hessian information for more accurate pruning without knowing their specific values, thus not bothered by the common Hessian approximation problems. Empirically, the proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning. Their effectiveness is demonstrated with modern deep neural networks on the CIFAR and ImageNet datasets, achieving competitive results compared to many state-of-the-art algorithms. Our code and trained models are publicly available at https://github.com/mingsuntse/regularization-pruning.
翻译:长期以来,常规化一直被用来学习深层神经网络运行中的偏狭性。然而,它的作用主要是在小型惩罚强度制度下探索的。在这项工作中,我们将其应用范围扩大到一个新的情景,即正规化将大规模逐渐扩大,以解决裁剪的两个核心问题:裁剪时间表和重量重要性评分。 (1) 这项工作中新提出了前一个专题,我们认为这对裁剪性表现至关重要,但很少受到研究关注。具体地说,我们提出了一个L2正规化变体,其惩罚因素在不断上升,并表明它能够带来与一角对口相比的显著准确性增益,即使相同的重量被消除。 (2) 越来越高的处罚办法还使我们在不了解赫萨信息的具体价值的情况下,利用赫萨信息进行更准确的裁剪,从而不为普通赫萨近似问题所困扰。 抽象地说,拟议的算法很容易实施,而且可扩缩到结构化和非结构化的裁剪辑中的大数据集和网络。 其效力通过CFAR和图像网数据集的现代深层神经网络网络展示,在公开的模型/常规模型上取得竞争性的结果。