过滤 PPRune, 或到层 PPRune, 这就是问题 (To Filter Prune, or to Layer Prune, That Is The Question)

Recent advances in pruning of neural networks have made it possible to remove a large number of filters or weights without any perceptible drop in accuracy. The number of parameters and that of FLOPs are usually the reported metrics to measure the quality of the pruned models. However, the gain in speed for these pruned models is often overlooked in the literature due to the complex nature of latency measurements. In this paper, we show the limitation of filter pruning methods in terms of latency reduction and propose LayerPrune framework. LayerPrune presents a set of layer pruning methods based on different criteria that achieve higher latency reduction than filter pruning methods on similar accuracy. The advantage of layer pruning over filter pruning in terms of latency reduction is a result of the fact that the former is not constrained by the original model's depth and thus allows for a larger range of latency reduction. For each filter pruning method we examined, we use the same filter importance criterion to calculate a per-layer importance score in one-shot. We then prune the least important layers and fine-tune the shallower model which obtains comparable or better accuracy than its filter-based pruning counterpart. This one-shot process allows to remove layers from single path networks like VGG before fine-tuning, unlike in iterative filter pruning, a minimum number of filters per layer is required to allow for data flow which constraint the search space. To the best of our knowledge, we are the first to examine the effect of pruning methods on latency metric instead of FLOPs for multiple networks, datasets and hardware targets. LayerPrune also outperforms handcrafted architectures such as Shufflenet, MobileNet, MNASNet and ResNet18 by 7.3%, 4.6%, 2.8% and 0.5% respectively on similar latency budget on ImageNet dataset.

翻译：神经网络修补最近的进展使得能够删除大量过滤器或重量,而不会有任何可见的准确性下降。参数的数量和FLOPs的数量通常是用来测量修补模型质量的报告量度。然而,这些修补模型的速度的增加在文献中常常被忽略,原因是延时测量的复杂性质。在本文中,我们展示了过滤处理方法的局限性,这表现了降低延时度,并提出了TeopPrune框架。TillPrune展示了一套基于不同标准的层修补方法,比过滤处理方法的精度降低得更高。参数的数量和FLOPs的数量通常是用来测量精度模型质量的优势。我们随后在最起码的平时端和最起码的直径端数据路径之前,我们用相同的过滤器来计算每层的精度。我们从最起码的直径端到最起码的直径直的直度,我们从最起码的直径直的平面和最接近的直径直方的平面的平面的平面数据阶到最上,我们从一个最接近的平面的平面的平面的平面的平面的平面的平面的平面和最直方的平面的平面的平面的平面的平面的平面的平面的平面的平面的平面的平面的平面的平面的平面图。