The existence of redundancy in Convolutional Neural Networks (CNNs) enables us to remove some filters/channels with acceptable performance drops. However, the training objective of CNNs usually tends to minimize an accuracy-related loss function without any attention paid to the redundancy, making the redundancy distribute randomly on all the filters, such that removing any of them may trigger information loss and accuracy drop, necessitating a following finetuning step for recovery. In this paper, we propose to manipulate the redundancy during training to facilitate network pruning. To this end, we propose a novel Centripetal SGD (C-SGD) to make some filters identical, resulting in ideal redundancy patterns, as such filters become purely redundant due to their duplicates; hence removing them does not harm the network. As shown on CIFAR and ImageNet, C-SGD delivers better performance because the redundancy is better organized, compared to the existing methods. The efficiency also characterizes C-SGD because it is as fast as regular SGD, requires no finetuning, and can be conducted simultaneously on all the layers even in very deep CNNs. Besides, C-SGD can improve the accuracy of CNNs by first training a model with the same architecture but wider layers then squeezing it into the original width.
翻译:革命神经网络(CNNs)中的冗余使得我们能够去除某些可以接受性能下降的过滤器/通道;然而,CNN的训练目标通常倾向于在不注意冗余的情况下尽量减少与准确性有关的损失功能,使冗余在所有过滤器上随机分布,这样消除其中任何冗余都可能导致信息丢失和准确性下降,从而有必要采取随后微调的恢复步骤。在本文件中,我们提议在培训期间操纵冗余,以便利网络的运行。为此,我们提议建立一个新型的Centripetal SGD(C-SGD),使一些过滤器完全相同,从而形成理想的冗余冗余模式,因为这类过滤器因其重复而变得纯粹多余;因此,去除它们不会损害网络。正如在CIFAR和图像网络上显示的那样, C-SGD提供更好的性能,因为与现有方法相比,冗余性能组织得更好。 效率也是C-SGD的特点,因为它与常规的SGD一样快速,不需要微调,并且可以同时在所有层次上进行相同的过滤,甚至以非常深的CNNMNCD为第一层。此外,C-SGD的架构可以改进S-SGD的精度。