Model quantization is known as a promising method to compress deep neural networks, especially for inferences on lightweight mobile or edge devices. However, model quantization usually requires access to the original training data to maintain the accuracy of the full-precision models, which is often infeasible in real-world scenarios for security and privacy issues. A popular approach to perform quantization without access to the original data is to use synthetically generated samples, based on batch-normalization statistics or adversarial learning. However, the drawback of such approaches is that they primarily rely on random noise input to the generator to attain diversity of the synthetic samples. We find that this is often insufficient to capture the distribution of the original data, especially around the decision boundaries. To this end, we propose Qimera, a method that uses superposed latent embeddings to generate synthetic boundary supporting samples. For the superposed embeddings to better reflect the original distribution, we also propose using an additional disentanglement mapping layer and extracting information from the full-precision model. The experimental results show that Qimera achieves state-of-the-art performances for various settings on data-free quantization. Code is available at https://github.com/iamkanghyunchoi/qimera.
翻译:模型定量化被认为是压缩深神经网络,特别是轻量移动或边缘装置的推断的一种很有希望的方法。然而,模型量化化通常要求获得原始培训数据,以保持完全精准模型的准确性,而在现实世界中,安全和隐私问题往往不可行。一个不获取原始数据的普及性量化办法是利用合成生成的样本,根据批量标准化统计或对称学习,使用合成生成的样本,更好地反映原始分布。然而,这些方法的缺点是,它们主要依靠发电机随机噪声输入,以实现合成样品的多样性。我们发现,这往往不足以捕捉原始数据的分布,特别是在决定边界周围。为此,我们提议使用Qimera这一方法,即使用超级潜隐含嵌入来生成合成边界支持样品。对于超级嵌入以更好地反映原始分布,我们还提议使用额外的扰动图层图解图层,并从完整精准模型中提取信息。实验结果显示,Qimera 实现原始数据的分布,特别是围绕决定边界的分布。我们提议使用Qimera-qual-deal-demaqual 各种数据设置。