Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation (PAAM), that sparsifies the CNN's set of filters through a particular attention mechanism, during-training. PAAM learns analog filter scores from the filter weights by optimizing a cost function regularized by an additive term in the scores. As the filters are not independent, we use attention to dynamically learn their correlations. Moreover, by training the pruning scores of all layers simultaneously, PAAM can account for layer inter-dependencies, which is essential to finding a performant sparse sub-network. PAAM can also train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pre-trained network. Finally, PAAM does not need layer-specific hyperparameters and pre-defined layer budgets, since it can implicitly determine the appropriate number of filters in each layer. Our experimental results on different network architectures suggest that PAAM outperforms state-of-the-art structured-pruning methods (SOTA). On CIFAR-10 dataset, without requiring a pre-trained baseline network, we obtain 1.02% and 1.19% accuracy gain and 52.3% and 54% parameters reduction, on ResNet56 and ResNet110, respectively. Similarly, on the ImageNet dataset, PAAM achieves 1.06% accuracy gain while pruning 51.1% of the parameters on ResNet50. For Cifar-10, this is better than the SOTA with a margin of 9.5% and 6.6%, respectively, and on ImageNet with a margin of 11%.
翻译:CNN 的过滤器运行过滤器通常通过在CNN的过滤器重量或激活地图上应用离散面罩实现。 培训后, 我们在这里展示了一个新的过滤器- 重要性显示概念, 其名称是积极关注操纵( PAAM), 将CNN的一组过滤器通过一个特殊关注机制在培训期间进行过滤。 PAAM 通过在分数中优化一个添加术语调整的成本功能, 从过滤器重量中学习模拟过滤器分数。 由于过滤器不独立, 我们使用注意力来动态地学习它们的关联性。 此外, 通过同时培训所有层的分数, PAAM 可以对层间相互依存关系进行解释, 这对于找到一个功能性分散的子网络( PAAM) 至关重要。 PAAM 也可以通过一个简单、 一级培训过程的抓痕来训练和生成一个纯净化的网络, 而不需要事先训练的网络。 最后, MAAM 不需要一个分的超分数和预定义的层预算, 因为过滤器可以隐含地确定每个层的过滤器数量。 在不同的网络结构结构中, 我们的实验结果中, 59%的参数显示, SAAM 的精确值为SARDRARAM 。