Network compression is effective in accelerating the inference of deep neural networks, but often requires finetuning with all the training data to recover from the accuracy loss. It is impractical in some applications, however, due to data privacy issues or constraints in compression time budget. To deal with the above issues, we propose a method named PRACTISE to accelerate the network with tiny sets of training images. By considering both the pruned part and the unpruned part of a compressed model, PRACTISE alleviates layer-wise error accumulation, which is the main drawback of previous methods. Furthermore, existing methods are confined to few compression schemes, have limited speedup in terms of latency, and are unstable. In contrast, PRACTISE is stable, fast to train, versatile to handle various compression schemes, and achieves low latency. We also propose that dropping entire blocks is a better way than existing compression schemes when only tiny sets of training data are available. Extensive experiments demonstrate that PRACTISE achieves much higher accuracy and more stable models than state-of-the-art methods.
翻译:网络压缩在加速深神经网络的推断方面是有效的,但往往需要对所有培训数据进行微调,以便从准确性损失中恢复过来。然而,在某些应用中,由于数据隐私问题或压缩时间预算的限制,这不切实际。为了处理上述问题,我们提出了一个名为PRACTISE的方法,用微小的培训图像加速网络。通过考虑压缩模型的细小部分和未加工部分,PRACTISE减轻了从层到层的错误积累,这是以前方法的主要缺陷。此外,现有方法仅限于少数压缩计划,在延缓性方面速度有限,而且不稳定。相反,PRACTISE稳定、快速培训、多功能地处理各种压缩计划,并且达到低密度。我们还建议,在只有小部分培训数据可用的情况下,将整个街区丢弃比现有的压缩计划更好。广泛的实验表明,PACTISE比目前的方法更精确、更稳定。