To obtain lower inference latency and less memory footprint of deep neural networks, model quantization has been widely employed in deep model deployment, by converting the floating points to low-precision integers. However, previous methods (such as quantization aware training and post training quantization) require original data for the fine-tuning or calibration of quantized model, which makes them inapplicable to the cases that original data are not accessed due to privacy or security. This gives birth to the data-free quantization method with synthetic data generation. While current data-free quantization methods still suffer from severe performance degradation when quantizing a model into lower bit, caused by the low inter-class separability of semantic features. To this end, we propose a new and effective data-free quantization method termed ClusterQ, which utilizes the feature distribution alignment for synthetic data generation. To obtain high inter-class separability of semantic features, we cluster and align the feature distribution statistics to imitate the distribution of real data, so that the performance degradation is alleviated. Moreover, we incorporate the diversity enhancement to solve class-wise mode collapse. We also employ the exponential moving average to update the centroid of each cluster for further feature distribution improvement. Extensive experiments based on different deep models (e.g., ResNet-18 and MobileNet-V2) over the ImageNet dataset demonstrate that our proposed ClusterQ model obtains state-of-the-art performance.
翻译:为了获得深神经网络较低的推导延缩度和较少记忆足迹,模型量化在深层模型部署中被广泛采用,将浮动点转换为低精度整数;然而,以往的方法(如量化意识培训和后培训四分制)要求为微调或校准四分制模型提供原始数据,使其不适用于由于隐私或安全原因而无法获取原始数据的情况。这导致数据无四分制方法与合成数据生成相结合。当将模型四分制成低位时,目前数据无四分化方法仍然受到性能严重退化的影响,因为将模型四分制成低位。然而,为了达到这一目的,我们建议采用一种新的有效的无数据四分化方法,称为CroupQ,该方法利用特性分布对合成数据生成的特征进行校正。要获得高等级间差异模型特征的分化,我们将特征分布统计数据与合成数据生成相匹配,这样一分化的方法仍然受到严重性能退化的影响,因此性能退化正在减缓。此外,我们将多样性增强的模型纳入基于不同类内流模式的模型。