Post-training Neural Network (NN) model compression is an attractive approach for deploying large, memory-consuming models on devices with limited memory resources. In this study, we investigate the rate-distortion tradeoff for NN model compression. First, we suggest a Rotation-Invariant Quantization (RIQ) technique that utilizes a single parameter to quantize the entire NN model, yielding a different rate at each layer, i.e., mixed-precision quantization. Then, we prove that our rotation-invariant approach is optimal in terms of compression. We rigorously evaluate RIQ and demonstrate its capabilities on various models and tasks. For example, RIQ facilitates $\times 19.4$ and $\times 52.9$ compression ratios on pre-trained VGG dense and pruned models, respectively, with $<0.4\%$ accuracy degradation. Code: \url{https://github.com/ehaleva/RIQ}.
翻译:培训后神经网络模型压缩是一种具有吸引力的方法,用于在记忆资源有限的设备上部署大型、耗时的记忆模型。在本研究中,我们调查了NN模型压缩的速率扭曲取舍。首先,我们建议采用一个参数来量化整个NN模型,在每一层产生不同的速率,即混精度四分制。然后,我们证明我们的旋转-异性方法在压缩方面是最佳的。我们严格评估RIQ,并展示其在各种模型和任务方面的能力。例如,RIQ在预先训练的VGGB密度和纯度模型上分别提供19.4美元和52.9美元的压缩比率,其精度为0.4美元。代码:=url{https://github.com/ehaleva/RI<unk> 。</s>