Accelerating deep neural network (DNN) inference on resource-limited devices is one of the most important barriers to ensuring a wider and more inclusive adoption. To alleviate this, DNN binary quantization for faster convolution and memory savings is one of the most promising strategies despite its serious drop in accuracy. The present paper therefore proposes a novel binary quantization function based on quantized compressed sensing (QCS). Theoretical arguments conjecture that our proposal preserves the practical benefits of standard methods, while reducing the quantization error and the resulting drop in accuracy.
翻译:加速深度神经网络(DNN)对资源有限装置的推断是确保更广泛和更具包容性的采纳的最重要障碍之一。为了减轻这一障碍,尽管其准确性严重下降,但DNN对于更快的变速和记忆节约的二进制量化是最有希望的战略之一。因此,本文件提出基于量化压缩感测的新颖的二进制量化功能。理论论点推断,我们的提案保留了标准方法的实际效益,同时减少了量化误差和由此导致的准确性下降。