Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures. Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored. This paper first investigates this problem on various commonly-used PTQ methods. We aim to answer several research questions related to the influence of calibration set distribution variations, calibration paradigm selection, and data augmentation or sampling strategies on PTQ reliability. A systematic evaluation process is conducted across a wide range of tasks and commonly-used PTQ paradigms. The results show that most existing PTQ methods are not reliable enough in term of the worst-case group performance, highlighting the need for more robust methods. Our findings provide insights for developing PTQ methods that can effectively handle distribution shift scenarios and enable the deployment of quantized DNNs in real-world applications.
翻译:摘要:后训练量化(Post-training quantization, PTQ)是一种流行的方法,用于压缩深度神经网络(DNNs)而不修改其原始架构或训练过程。尽管它具有有效性和方便性,但在存在一些极端情况,例如分布变化和数据噪声时,PTQ方法的可靠性仍然很少被探索。本文首先对各种常用的PTQ方法进行了调查,目的是回答与校准集分布变化、校准范式选择以及数据增强或抽样策略对PTQ可靠性影响有关的几个研究问题。在广泛的任务和常用的PTQ范式中进行了系统的评估过程。结果表明,大多数现有的PTQ方法在最坏情况下的组性能方面不够可靠,凸显了需要更加稳健的方法。我们的研究发现为开发能够有效处理分布变化情况并使量化的DNNs在现实应用中部署的PTQ方法提供了洞见。