Neural network models are resource hungry. It is difficult to deploy such deep networks on devices with limited resources, like smart wearables, cellphones, drones, and autonomous vehicles. Low bit quantization such as binary and ternary quantization is a common approach to alleviate this resource requirements. Ternary quantization provides a more flexible model and outperforms binary quantization in terms of accuracy, however doubles the memory footprint and increases the computational cost. Contrary to these approaches, mixed quantized models allow a trade-off between accuracy and memory footprint. In such models, quantization depth is often chosen manually, or is tuned using a separate optimization routine. The latter requires training a quantized network multiple times. Here, we propose an adaptive combination of binary and ternary quantization, namely Smart Quantization (SQ), in which the quantization depth is modified directly via a regularization function, so that the model is trained only once. Our experimental results show that the proposed method adapts quantization depth successfully while keeping the model accuracy high on MNIST and CIFAR10 benchmarks.
翻译:智能磨损器、手机、无人驾驶飞机和自主车辆等资源有限的装置上很难部署这种深层次网络。二进制和四进制等低位量化是缓解资源需求的共同办法。在精确度方面,Ternary 量化提供了一个更灵活的模型,优于二进制量化,但将记忆足迹翻一番,并增加了计算成本。与这些方法相反,混合定量化模型允许在精确度和记忆足迹之间进行权衡。在这类模型中,量化深度往往是手工选择的,或者使用单独的优化常规来调整。后者需要多次培训一个四进制网络。在这里,我们提议将二进制和四进制量化(SQ)进行适应性组合,即智能量化(SQQ),通过正规化功能直接修改四进制深度,从而只对模型进行过一次培训。我们的实验结果表明,拟议方法在使模型的精确度深度上保持高MNIST和CIFAR10基准,同时成功地调整了量化深度。