As neural networks have become more powerful, there has been a rising desire to deploy them in the real world; however, the power and accuracy of neural networks is largely due to their depth and complexity, making them difficult to deploy, especially in resource-constrained devices. Neural network quantization has recently arisen to meet this demand of reducing the size and complexity of neural networks by reducing the precision of a network. With smaller and simpler networks, it becomes possible to run neural networks within the constraints of their target hardware. This paper surveys the many neural network quantization techniques that have been developed in the last decade. Based on this survey and comparison of neural network quantization techniques, we propose future directions of research in the area.
翻译:随着神经网络变得更加强大,人们越来越愿意在现实世界中部署这些网络;然而,神经网络的功率和准确性主要在于其深度和复杂性,因此难以部署,特别是在资源限制的装置中。神经网络最近出现了量子化,以满足通过降低网络的精度来降低神经网络的规模和复杂性的需求。随着网络规模较小和更加简单,因此有可能在其目标硬件的限制下运行神经网络。本文调查了过去十年中开发的许多神经网络量化技术。根据这次调查和对神经网络量化技术的比较,我们提出了该地区未来研究的方向。