We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-of-the-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.
翻译:我们建议采用量化制导培训(QGT)方法,指导DNN培训优化低比精确度目标,达到8比特的极端压缩水平。与标准的量化-认知培训(QAT)方法不同,QGT采用定制化的规范化方法,鼓励按重量值进行分配,以最大限度地提高准确度,同时减少量化误差。这种方法的主要好处之一是能够识别压缩瓶颈。我们利用视觉数据集方面最先进的模型架构验证QGT。我们还展示了QGT的有效性,即81KB小模型,用于个人检测降至2比特的精确度(即17.7x大小减少),同时保持比浮动点基线仅下降3%的精确度。