Mixed-precision quantization has received increasing attention for its capability of reducing the computational burden and speeding up the inference time. Existing methods usually focus on the sensitivity of different network layers, which requires a time-consuming search or training process. To this end, a novel mixed-precision quantization method, termed CSMPQ, is proposed. Specifically, the TF-IDF metric that is widely used in natural language processing (NLP) is introduced to measure the class separability of layer-wise feature maps. Furthermore, a linear programming problem is designed to derive the optimal bit configuration for each layer. Without any iterative process, the proposed CSMPQ achieves better compression trade-offs than the state-of-the-art quantization methods. Specifically, CSMPQ achieves 73.03$\%$ Top-1 acc on ResNet-18 with only 59G BOPs for QAT, and 71.30$\%$ top-1 acc with only 1.5Mb on MobileNetV2 for PTQ.
翻译:现有方法通常侧重于不同网络层的敏感度,这需要花费时间的搜索或培训过程。为此,提议了一种新型的混合精度量化方法,称为CSMPQ。具体地说,引入了在自然语言处理中广泛使用的TF-IDF指标,以测量分层特征图的分类可分离性。此外,线性编程问题旨在为每一层获得最佳的位子配置。在没有任何迭接程序的情况下,拟议的CSMPQ实现比最先进的分级法更好的压缩取舍。具体地说,CSMPQ在ResNet-18上只达到73.03$+1 cc美元,而QAT只达到59G BOP,PT为71.30$+1 cc,而PMNetV2只达到1.5Mb。