Many existing Neural Network pruning approaches either rely on retraining to compensate for pruning-caused performance degradation or they induce strong biases to converge to a specific sparse solution throughout training. A third paradigm obtains a wide range of compression ratios from a single dense training run while also avoiding retraining. Recent work of Pokutta et al. (2020) and Miao et al. (2022) suggests that the Stochastic Frank-Wolfe (SFW) algorithm is particularly suited for training state-of-the-art models that are robust to compression. We propose leveraging $k$-support norm ball constraints and demonstrate significant improvements over the results of Miao et al. (2022) in the case of unstructured pruning. We also extend these ideas to the structured pruning domain and propose novel approaches to both ensure robustness to the pruning of convolutional filters as well as to low-rank tensor decompositions of convolutional layers. In the latter case, our approach performs on-par with nuclear-norm regularization baselines while requiring only half of the computational resources. Our findings also indicate that the robustness of SFW-trained models largely depends on the gradient rescaling of the learning rate and we establish a theoretical foundation for that practice.
翻译:许多现有的神经网络修剪方法要么依靠再培训来弥补裁剪导致的性能退化,要么在培训过程中引起强烈的偏向,以形成具体的零散解决办法。第三种模式从一次密集培训中获得广泛的压缩比率,同时避免再培训。Pokotta等人(2020年)和Miao等人(2022年)最近的工作表明,Stochastic Frank-Wolfe(SFW)算法特别适合培训最先进的、稳健的压缩模型。我们提议利用美元支持规范球的制约,在非结构化的裁剪方面,对Mioo等人(2022年)的结果有显著的改进。我们还将这些想法扩大到结构化的裁剪裁领域,并提出新的办法,以确保动态过滤器的稳健稳性,以及低级的变压层阵列。在后的情况是,我们的方法是用核气调定型基准进行,而只需要一半的计算资源。我们的调查结果还表明,SFW所训练的模型的坚固性能性能决定我们所学的升级模型的基础。