批量量数: 使用硬度量度器进行全结构搜索 (BatchQuant: Quantized-for-all Architecture Search with Robust Quantizer)

As the applications of deep learning models on edge devices increase at an accelerating pace, fast adaptation to various scenarios with varying resource constraints has become a crucial aspect of model deployment. As a result, model optimization strategies with adaptive configuration are becoming increasingly popular. While single-shot quantized neural architecture search enjoys flexibility in both model architecture and quantization policy, the combined search space comes with many challenges, including instability when training the weight-sharing supernet and difficulty in navigating the exponentially growing search space. Existing methods tend to either limit the architecture search space to a small set of options or limit the quantization policy search space to fixed precision policies. To this end, we propose BatchQuant, a robust quantizer formulation that allows fast and stable training of a compact, single-shot, mixed-precision, weight-sharing supernet. We employ BatchQuant to train a compact supernet (offering over $10^{76}$ quantized subnets) within substantially fewer GPU hours than previous methods. Our approach, Quantized-for-all (QFA), is the first to seamlessly extend one-shot weight-sharing NAS supernet to support subnets with arbitrary ultra-low bitwidth mixed-precision quantization policies without retraining. QFA opens up new possibilities in joint hardware-aware neural architecture search and quantization. We demonstrate the effectiveness of our method on ImageNet and achieve SOTA Top-1 accuracy under a low complexity constraint ($<20$ MFLOPs). The code and models will be made publicly available at https://github.com/bhpfelix/QFA.

翻译：随着边缘装置深度学习模型的应用速度加快,快速适应不同资源制约的各种情景已成为模型部署的一个关键方面。因此,适应性配置的模型优化战略越来越受欢迎。尽管单发量定量神经结构搜索在模型架构和量化政策上都具有灵活性,但合并搜索空间带来许多挑战,包括当培训权重共享超级网时出现不稳定,在飞速增长的搜索空间中难以通航。现有方法倾向于将建筑搜索空间限制在一小套选项中,或者将量化政策搜索空间限制在固定精确政策上。为此,我们提议采用“批量OPQaunat”这一具有适应性配置的模型,这是一个强大的量化方位配方,能够快速和稳定地培训一个紧凑、单发量、混合精度、权重共享超级网。我们使用“批量量化”来训练一个紧凑的超级网(超过10 ⁇ 76美元),在快速增长的搜索小时内,在大大少于以往方法的GPUPO(G)小时内,我们的方法、量化和量化的量化政策(QFA)是第一个至无缝的“无缝搜索限的“一级”自动分享权重数据共享的“超额共享”的“NAS-SQSQSqQSqu-QSqual”超级政策。