We propose ResRep, a novel method for lossless channel pruning (a.k.a. filter pruning), which slims down a CNN by reducing the width (number of output channels) of convolutional layers. Inspired by the neurobiology research about the independence of remembering and forgetting, we propose to re-parameterize a CNN into the remembering parts and forgetting parts, where the former learn to maintain the performance and the latter learn to prune. Via training with regular SGD on the former but a novel update rule with penalty gradients on the latter, we realize structured sparsity. Then we equivalently merge the remembering and forgetting parts into the original architecture with narrower layers. In this sense, ResRep can be viewed as a successful application of Structural Re-parameterization. Such a methodology distinguishes ResRep from the traditional learning-based pruning paradigm that applies a penalty on parameters to produce sparsity, which may suppress the parameters essential for the remembering. ResRep slims down a standard ResNet-50 with 76.15% accuracy on ImageNet to a narrower one with only 45% FLOPs and no accuracy drop, which is the first to achieve lossless pruning with such a high compression ratio. The code and models are at https://github.com/DingXiaoH/ResRep.
翻译:我们提出ResRep, 这是一种用于无损失的频道剪切(a.k.a.a.fercer pruning)的新颖方法,它通过减少卷发层的宽度(输出频道的数量)而使CNN缩缩略下来。在神经生物学研究的启发下,关于记忆和遗忘的独立性,我们提议将CNN重新配对成记忆部分和忘记部分,我们提议将CNN重新配对成记忆部分和忘记部分,前者学习保持性能,后者学习精度,后者学习精度。对前者进行常规SGD培训,而后一种带有惩罚性梯度的新更新规则,我们认识到了结构的紧张性。然后我们把记忆和忘记的部分与原始结构结构的狭小层混为一体。从这个意义上说,Resrep可被视为一个成功的结构再校准应用。这种方法将ResRep与传统的基于学习的裁剪裁模式区分开来保持性能,后者可能会抑制记忆所必需的参数。ResRep 将标准ResNet-50降为标准,76.15%的精确度在图像网络上, 其精确度只有45/FOPxxxx 才能获得如此高的精确度。