Uniform-precision neural network quantization has gained popularity since it simplifies densely packed arithmetic unit for high computing capability. However, it ignores heterogeneous sensitivity to the impact of quantization errors across the layers, resulting in sub-optimal inference accuracy. This work proposes a novel neural architecture search called neural channel expansion that adjusts the network structure to alleviate accuracy degradation from ultra-low uniform-precision quantization. The proposed method selectively expands channels for the quantization sensitive layers while satisfying hardware constraints (e.g., FLOPs, PARAMs). Based on in-depth analysis and experiments, we demonstrate that the proposed method can adapt several popular networks channels to achieve superior 2-bit quantization accuracy on CIFAR10 and ImageNet. In particular, we achieve the best-to-date Top-1/Top-5 accuracy for 2-bit ResNet50 with smaller FLOPs and the parameter size.
翻译:统一精密神经网络量化由于简化了高计算能力,因而越来越受欢迎。但是,它忽略了对各层量化错误影响的不同敏感度,导致次优推导准确性。这项工作提议进行新型神经结构搜索,称为神经信道扩展,以调整网络结构,减轻超低统一精度量化的精确度退化。拟议方法在满足硬件限制(如FLOPs、PARAMs)的同时,有选择地扩大了对量化敏感层的渠道。根据深入分析和实验,我们证明拟议方法可以调整几个流行网络渠道,以达到CIFAR10和图像网络的高级2位四分化精度。特别是,我们以较小的FLOPs和参数大小,实现了2位ResNet50最先进的顶端-1/Top-5精度。