Structured pruning is an effective approach for compressing large pre-trained neural networks without significantly affecting their performance. However, most current structured pruning methods do not provide any performance guarantees, and often require fine-tuning, which makes them inapplicable in the limited-data regime. We propose a principled data-efficient structured pruning method based on submodular optimization. In particular, for a given layer, we select neurons/channels to prune and corresponding new weights for the next layer, that minimize the change in the next layer's input induced by pruning. We show that this selection problem is a weakly submodular maximization problem, thus it can be provably approximated using an efficient greedy algorithm. Our method is guaranteed to have an exponentially decreasing error between the original model and the pruned model outputs w.r.t the pruned size, under reasonable assumptions. It is also one of the few methods in the literature that uses only a limited-number of training data and no labels. Our experimental results demonstrate that our method outperforms state-of-the-art methods in the limited-data regime.
翻译:结构化修剪是压缩大型预先训练的神经网络的有效方法,但不会对其性能产生重大影响。然而,目前大多数结构化修剪方法并不提供任何性能保障,而且往往需要微调,因此无法在有限数据制度中适用。我们提议了一种基于亚模式优化的有原则的数据高效结构修剪方法。特别是,对于某一层,我们选择神经元/气道进行精细处理,并对下一个层进行相应的新重量,以最大限度地减少下层因裁剪引起的投入变化。我们表明,这一选择问题是一个微弱的次模式最大化问题,因此,可以用高效的贪婪算法来估计它。我们的方法保证在原始模型与纯化模型输出的大小之间,在合理的假设下,将出现急剧减少的错误。这也是文献中仅使用有限数量的培训数据和无标签的少数方法之一。我们的实验结果表明,我们的方法在有限数据制度中超过了最先进的方法。