We present a computationally efficient method for compressing a trained neural network without using any data. We break the problem of data-free network compression into independent layer-wise compressions. We show how to efficiently generate layer-wise training data, and how to precondition the network to maintain accuracy during layer-wise compression. Our generic technique can be used with any compression method. We outperform related works for data-free low-bit-width quantization on MobileNetV1, MobileNetV2, and ResNet18. We also demonstrate the efficacy of our layer-wise method when applied to pruning. We outperform baselines in the low-computation regime suitable for on-device edge compression while using orders of magnitude less memory and compute time than comparable generative methods. In the high-computation regime, we show how to combine our method with generative methods to improve upon state-of-the-art performance for several networks.
翻译:我们在不使用任何数据的情况下为压缩受过训练的神经网络提出了一种计算高效的方法。我们打破了无数据网络压缩成独立的分层压缩的问题。我们展示了如何高效生成多层化培训数据,以及如何在按层压缩过程中为网络保持准确性提供先决条件。我们的一般技术可以用任何压缩方法使用。我们在移动网络1、移动网络2和ResNet18上优于无数据低位-维维特量化的相关工作。我们还展示了在将我们这一多层方法应用于处理时的功效。我们在低位化边缘压缩时,在使用数量级减少内存和比可比的基因化方法的低位化方法时,优于适合在离子边缘压缩的低位计算系统中的基线。在高位计算制度中,我们展示了如何将我们的方法与改进若干网络的状态性能的基因化方法相结合。