Network pruning techniques, including weight pruning and filter pruning, reveal that most state-of-the-art neural networks can be accelerated without a significant performance drop. This work focuses on filter pruning which enables accelerated inference with any off-the-shelf deep learning library and hardware. We propose the concept of \emph{network pruning spaces} that parametrize populations of subnetwork architectures. Based on this concept, we explore the structure aspect of subnetworks that result in minimal loss of accuracy in different pruning regimes and arrive at a series of observations by comparing subnetwork distributions. We conjecture through empirical studies that there exists an optimal FLOPs-to-parameter-bucket ratio related to the design of original network in a pruning regime. Statistically, the structure of a winning subnetwork guarantees an approximately optimal ratio in this regime. Upon our conjectures, we further refine the initial pruning space to reduce the cost of searching a good subnetwork architecture. Our experimental results on ImageNet show that the subnetwork we found is superior to those from the state-of-the-art pruning methods under comparable FLOPs.
翻译:网络剪枝技术,包括权重剪枝和滤波器剪枝,表明大部分最先进的神经网络可在不显著降低性能的情况下加速。本文关注于滤波器剪枝,它可以使任何现成的深度学习库和硬件进行加速推断。我们提出了\emph{网络剪枝空间}的概念,该概念参数化子网络体系结构的群体。基于该概念,我们探索了子网络结构在不同的剪枝机制下导致的最小精度损失方面,并通过比较子网络分布得出了一系列观察结果。通过经验研究,我们假设在剪枝机制中存在一个与原始网络设计相关的最佳FLOPs-to-parameter-bucket比率。从统计上讲,获胜子网络的结构保证了该机制中近似最优的比率。根据我们的猜想,我们进一步细化了初始剪枝空间,以减少搜索良好子网络架构的成本。我们在ImageNet上的实验结果表明,我们发现的子网络要优于在相似FLOPs下的最先进的剪枝方法。