Since model quantization helps to reduce the model size and computation latency, it has been successfully applied in many applications of mobile phones, embedded devices and smart chips. The mixed-precision quantization model can match different quantization bit-precisions according to the sensitivity of different layers to achieve great performance. However, it is a difficult problem to quickly determine the quantization bit-precision of each layer in deep neural networks according to some constraints (e.g., hardware resources, energy consumption, model size and computation latency). To address this issue, we propose a novel sequential single path search (SSPS) method for mixed-precision quantization,in which the given constraints are introduced into its loss function to guide searching process. A single path search cell is used to combine a fully differentiable supernet, which can be optimized by gradient-based algorithms. Moreover, we sequentially determine the candidate precisions according to the selection certainties to exponentially reduce the search space and speed up the convergence of searching process. Experiments show that our method can efficiently search the mixed-precision models for different architectures (e.g., ResNet-20, 18, 34, 50 and MobileNet-V2) and datasets (e.g., CIFAR-10, ImageNet and COCO) under given constraints, and our experimental results verify that SSPS significantly outperforms their uniform counterparts.
翻译:由于模型量化有助于减少模型大小和计算延缩度,因此在移动电话、嵌入装置和智能芯片的许多应用中成功地应用了模型孔化模型。混合精密量化模型可以根据不同层的敏感性匹配不同量化位精度,以取得卓越的性能。然而,要根据某些制约因素(例如硬件资源、能源消耗、模型大小和计算长度)迅速确定深神经网络中每个层的四分化比精度,这是个困难。为了解决这个问题,我们建议了一种新型的相继单一路径搜索(SSPS)方法,用于混合精度化量化。混合精度计模型中,将特定限制引入到其损失函数中,以指导搜索进程。一个单一路径搜索细胞用来将完全不同的超级网组合成一个完全不同的超级网,这些超级网可以通过基于梯度的算法进行优化。此外,我们依次根据选择的精度确定候选的精确度,以指数减少搜索空间并加快搜索过程的趋同速度。实验表明,我们的方法可以有效地搜索混合精度模型模型,用于不同结构(SNet,S-34、Reserverib 和IMFRA) 和C-ROFAS-C-C-ROFRA 。