The mainstream approach for filter pruning is usually either to force a hard-coded importance estimation upon a computation-heavy pretrained model to select "important" filters, or to impose a hyperparameter-sensitive sparse constraint on the loss objective to regularize the network training. In this paper, we present a novel filter pruning method, dubbed dynamic-coded filter fusion (DCFF), to derive compact CNNs in a computation-economical and regularization-free manner for efficient image classification. Each filter in our DCFF is firstly given an inter-similarity distribution with a temperature parameter as a filter proxy, on top of which, a fresh Kullback-Leibler divergence based dynamic-coded criterion is proposed to evaluate the filter importance. In contrast to simply keeping high-score filters in other methods, we propose the concept of filter fusion, i.e., the weighted averages using the assigned proxies, as our preserved filters. We obtain a one-hot inter-similarity distribution as the temperature parameter approaches infinity. Thus, the relative importance of each filter can vary along with the training of the compact CNN, leading to dynamically changeable fused filters without both the dependency on the pretrained model and the introduction of sparse constraints. Extensive experiments on classification benchmarks demonstrate the superiority of our DCFF over the compared counterparts. For example, our DCFF derives a compact VGGNet-16 with only 72.77M FLOPs and 1.06M parameters while reaching top-1 accuracy of 93.47% on CIFAR-10. A compact ResNet-50 is obtained with 63.8% FLOPs and 58.6% parameter reductions, retaining 75.60% top-1 accuracy on ILSVRC-2012. Our code, narrower models and training logs are available at https://github.com/lmbxmu/DCFF.
翻译:过滤过滤过滤程序的主流方法通常不是强制对计算超重的预选过滤器选择“ 重要” 过滤器进行硬码重要性估计,就是对丢失目标实施超参数敏感且稀释的限制,以使网络培训正规化。在本文中,我们提出了一个新型过滤处理方法,称为动态编码过滤器聚合(DCFF),用一个计算-经济和无规范的图像分类方式来获取压缩CNN。我们的DCFF的每个过滤器首先具有一个具有温度参数的相似性分布,而温度参数则是过滤器的代理器,其中,建议对基于动态编码标准的丢失目标进行新的 Kullback- Leicom 差异。相比之下,我们只是提出一个过滤器过滤器过滤方法,即所谓的动态编码过滤器的加权平均值,即作为保存的过滤器,只有I-lxlxlx的缩略图,我们得到了一等式的内位数分布。因此,每个过滤器的相对重要性可以随着对IMFGRF-lickr IM的升级标准的升级,而同时展示了我们FGR-lickral IM 的升级的升级的升级的缩缩缩缩缩缩缩缩缩缩缩缩缩缩缩定义的缩缩缩缩缩缩缩缩缩缩的缩的缩缩缩的缩缩缩缩缩的缩缩的缩图。