曾一度量化软件培训:高性能极低位建筑搜索 (Once Quantization-Aware Training: High Performance Extremely Low-bit Architecture Search)

Quantization Neural Networks (QNN) have attracted a lot of attention due to their high efficiency. To enhance the quantization accuracy, prior works mainly focus on designing advanced quantization algorithms but still fail to achieve satisfactory results under the extremely low-bit case. In this work, we take an architecture perspective to investigate the potential of high-performance QNN. Therefore, we propose to combine Network Architecture Search methods with quantization to enjoy the merits of the two sides. However, a naive combination inevitably faces unacceptable time consumption or unstable training problem. To alleviate these problems, we first propose the joint training of architecture and quantization with a shared step size to acquire a large number of quantized models. Then a bit-inheritance scheme is introduced to transfer the quantized models to the lower bit, which further reduces the time cost and meanwhile improves the quantization accuracy. Equipped with this overall framework, dubbed as Once Quantization-Aware Training~(OQAT), our searched model family, OQATNets, achieves a new state-of-the-art compared with various architectures under different bit-widths. In particular, OQAT-2bit-M achieves 61.6% ImageNet Top-1 accuracy, outperforming 2-bit counterpart MobileNetV3 by a large margin of 9% with 10% less computation cost. A series of quantization-friendly architectures are identified easily and extensive analysis can be made to summarize the interaction between quantization and neural architectures. Codes and models are released at https://github.com/LaVieEnRoseSMZ/OQA

翻译：量化神经网络(QNN) 因其效率高而吸引了大量关注。为提高量化准确性, 先前的工作主要侧重于设计高级量化算法, 但仍无法在极低比特情况下取得令人满意的结果。在这项工作中, 我们从架构角度来调查高性能 QNN 的潜力。因此, 我们提议将网络架构搜索方法与量化方法结合起来, 以享受两面的优点。但是, 一个天真的组合不可避免地面临无法接受的时间消耗或不稳定的培训问题。为了缓解这些问题, 我们首先建议对结构进行联合培训, 并且以共享的步数交互化模型来获得大量的量化模型。然后, 我们推出一个位数性化模型, 来将四分化模型转移到较低的位数, 从而进一步降低时间成本, 同时提高量化准确性。我们用这个总体框架, 以“ 质化- 软件培训~ (OQAT), 我们的搜索模型家庭, OATNet, 实现一个新的州级/ 级化模型, 与“ 网络” 2OA” 和“ 本地” 级结构之间, 在不同的成本中, 在“ O- 2 本地” 和“ O- bial- bilal” 级结构中实现新的州- 10 % 和“ O- bal- bal- bal- bal- bal- bal- bal- bal” 之间, 在不同的“ O.