通过归责等级保护实现通用混合精密量度 (Generalizable Mixed-Precision Quantization via Attribution Rank Preservation)

In this paper, we propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy optimality, leading to heavy search cost on challenging largescale datasets in realistic applications. On the contrary, our GMPQ searches the mixed-quantization policy that can be generalized to largescale datasets with only a small amount of data, so that the search cost is significantly reduced without performance degradation. Specifically, we observe that locating network attribution correctly is general ability for accurate visual analysis across different data distribution. Therefore, despite of pursuing higher model accuracy and complexity, we preserve attribution rank consistency between the quantized models and their full-precision counterparts via efficient capacity-aware attribution imitation for generalizable mixed-precision quantization strategy search. Extensive experiments show that our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks in significantly reduced search cost. The code is available at https://github.com/ZiweiWangTHU/GMPQ.git.

翻译：在本文中,我们提出了一种通用的混合精度分解法(GMPQ),以有效推断。常规方法要求比特线搜索和模型部署的数据集的一致性,以保障政策的最佳性,导致在现实应用中对具有挑战性的大型数据集进行大量搜索的成本。相反,我们的GMPQ搜索了可以普遍化为大数据集的混合定量政策,只有少量数据,这样搜索成本就可以大大降低,而不会降低性能。具体地说,我们发现正确定位网络属性是在不同数据分布中进行准确视觉分析的一般能力。因此,尽管追求更高的模型准确性和复杂性,但我们通过以可实现的混精度分解战略搜索模拟,保持了四分位模型及其全精度对应方位的属性一致性。广泛的实验表明,我们的方法与最先进的混合精度交易网络相比,在大幅降低搜索成本的情况下获得了竞争性的精度-兼容性交易。代码可在 https://github.com/ZiwangUTH.