Model quantization can reduce the model size and computational latency, it has become an essential technique for the deployment of deep neural networks on resourceconstrained hardware (e.g., mobile phones and embedded devices). The existing quantization methods mainly consider the numerical elements of the weights and activation values, ignoring the relationship between elements. The decline of representation ability and information loss usually lead to the performance degradation. Inspired by the characteristics of images in the frequency domain, we propose a novel multiscale wavelet quantization (MWQ) method. This method decomposes original data into multiscale frequency components by wavelet transform, and then quantizes the components of different scales, respectively. It exploits the multiscale frequency and spatial information to alleviate the information loss caused by quantization in the spatial domain. Because of the flexibility of MWQ, we demonstrate three applications (e.g., model compression, quantized network optimization, and information enhancement) on the ImageNet and COCO datasets. Experimental results show that our method has stronger representation ability and can play an effective role in quantized neural networks.
翻译:模型定量化可以降低模型大小和计算延迟度,它已成为在资源限制的硬件(例如移动电话和嵌入装置)上部署深神经网络的必要技术。现有的定量化方法主要考虑加权和激活值的数值要素,忽略各元素之间的关系。代表能力和信息损失的下降通常会导致性能退化。受频率域图像特性的启发,我们提议采用新的多尺度波盘四分化(MWQ)方法。这种方法通过波盘变换将原始数据分解成多尺度频率组件,然后对不同尺度的组件进行定量化。它利用多尺度的频率和空间信息来减轻空间域四分化造成的信息损失。由于MWQ的灵活性,我们在图像网络和COCO数据集上展示了三种应用(例如模型压缩、四分化网络优化和信息增强)。实验结果表明,我们的方法具有更强的代表性能力,可以在量化的神经网络中发挥有效作用。