To bridge the ever increasing gap between deep neural networks' complexity and hardware capability, network quantization has attracted more and more research attention. The latest trend of mixed precision quantization takes advantage of hardware's multiple bit-width arithmetic operations to unleash the full potential of network quantization. However, this also results in a difficult integer programming formulation, and forces most existing approaches to use an extremely time-consuming search process even with various relaxations. Instead of solving a problem of the original integer programming, we propose to optimize a proxy metric, the concept of network orthogonality, which is highly correlated with the loss of the integer programming but also easy to optimize with linear programming. This approach reduces the search time and required data amount by orders of magnitude, with little compromise on quantization accuracy. Specifically, on post-training quantization, we achieve 71.27% Top-1 accuracy on MobileNetV2, which only takes 9 seconds for searching and 1.4 GPU hours for finetuning on ImageNet. Our codes are avaliable at https://github.com/MAC-AutoML/OMPQ.
翻译:为了弥合深神经网络复杂程度和硬件能力之间日益扩大的差距,网络量化吸引了越来越多的研究关注。最新的混合精确度化趋势利用硬件的多重位宽算算术操作来释放网络量化的全部潜力。然而,这也造成一个困难的整数编程,迫使大多数现有方法使用极其耗时的搜索程序,即使有各种节制,也不得不使用极其耗时的搜索程序。我们不解决原始整数编程的一个问题,而是建议优化一个代用指标,即网络或方位化概念,它与整数编程的丢失密切相关,但也容易与线性编程优化。这种方法减少了搜索时间,要求的数据数量按数量顺序排列,而在量化精度方面几乎没有妥协。具体地说,在培训后量化方面,我们在MiveNetV2上实现了71.27%的顶端-1精度,这只需要9秒钟的时间来搜索图像网并微调整1.4GPU小时。我们的代码可以在https://github.com/MACMAC-AutimutML/OMQ。