Pruning is a popular technique for reducing the model size and computational cost of convolutional neural networks (CNNs). However, a slow retraining or fine-tuning procedure is often required to recover the accuracy loss caused by pruning. Recently, a new research direction on weight pruning, pruning-at-initialization (PAI), is proposed to directly prune CNNs before training so that fine-tuning or retraining can be avoided. While PAI has shown promising results in reducing the model size, existing approaches rely on fine-grained weight pruning which requires unstructured sparse matrix computation, making it difficult to achieve real speedup in practice unless the sparsity is very high. This work is the first to show that fine-grained weight pruning is in fact not necessary for PAI. Instead, the layerwise compression ratio is the main critical factor to determine the accuracy of a CNN model pruned at initialization. Based on this key observation, we propose PreCropping, a structured hardware-efficient model compression scheme. PreCropping directly compresses the model at the channel level following the layerwise compression ratio. Compared to weight pruning, the proposed scheme is regular and dense in both storage and computation without sacrificing accuracy. In addition, since PreCropping compresses CNNs at initialization, the computational and memory costs of CNNs are reduced for both training and inference on commodity hardware. We empirically demonstrate our approaches on several modern CNN architectures, including ResNet, ShuffleNet, and MobileNet for both CIFAR-10 and ImageNet.
翻译:虽然PAI在缩小模型规模和计算成本方面显示出了令人瞩目的技术,但现有的方法在缩小模型规模和计算成本方面显示出了可喜的结果,但是,通常需要一种缓慢的再培训或微调程序才能恢复剪裁造成的准确性损失。最近,在培训之前,建议直接对CNN进行关于重量裁剪、剪裁初始化(PAI)的新研究方向,以便避免微调或再培训。尽管PAI在缩小模型规模方面显示出了可喜的结果,但现有的方法依靠微微重剪裁剪剪裁剪,这需要不结构化的矩阵计算,使得很难在实际操作中实现真正的加速,除非松动性非常高。这项工作首先表明对PAI来说实际上并不需要对重量裁剪裁、剪裁和初始化(PAI)进行新的研究方向。相反,层压缩比率是确定CNN模型在初始化时所操作的准确性。基于这一关键观察,我们提出了PreCropping、结构高效的硬件压缩模式。在最初一级直接压缩模型中,在系统一级,包括不断缩缩缩缩缩缩缩的机中,在不断计算中,在不断压缩的硬化的系统计算中,在不断的机中,在进行。