We present a computationally efficient method for compressing a trained neural network without using real data. We break the problem of data-free network compression into independent layer-wise compressions. We show how to efficiently generate layer-wise training data using only a pretrained network. We use this data to perform independent layer-wise compressions on the pretrained network. We also show how to precondition the network to improve the accuracy of our layer-wise compression method. We present results for layer-wise compression using quantization and pruning. When quantizing, we compress with higher accuracy than related works while using orders of magnitude less compute. When compressing MobileNetV2 and evaluating on ImageNet, our method outperforms existing methods for quantization at all bit-widths, achieving a $+0.34\%$ improvement in $8$-bit quantization, and a stronger improvement at lower bit-widths (up to a $+28.50\%$ improvement at $5$ bits). When pruning, we outperform baselines of a similar compute envelope, achieving $1.5$ times the sparsity rate at the same accuracy. We also show how to combine our efficient method with high-compute generative methods to improve upon their results.
翻译:我们在不使用真实数据的情况下提出了压缩经过训练的神经网络的计算高效方法。我们打破了无数据网络压缩成独立层层压缩的问题。我们展示了如何仅使用预先培训的网络高效生成分层培训数据。我们使用这些数据对预先培训的网络进行独立的分层压缩。我们还展示了如何将网络设定为提高我们分层压缩方法准确性的先决条件。我们用定量和纯度来显示分层压缩的结果。在量化时,我们用比相关工作更精确的压缩,同时使用数量级减少的压缩。在压缩移动网络2和评估图像网络时,我们的方法超越了所有位宽现有量化方法,在8美元的位宽度上实现+3.4 $的改进,在低位宽度上实现更强的改进(最高达1美元+28.50 美元的改进值 5美元)。在调整时,我们比相关工程的精确度要高得多,我们在调整类似缩放信封的基线时,在每张宽度上达到1.5美元时,我们的方法比重率也显示我们如何以同一方法组合。