While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation cost, memory, and power consumption are attained. We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism, and rigorously analyze its error. We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i.e., level of over-parametrization. Our result holds across a range of input distributions and for both fully-connected and convolutional architectures. To empirically evaluate the method, we quantize several common architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy. We also demonstrate that standard modifications, such as bias correction and mixed precision quantization, further improve accuracy.
翻译:虽然神经网络在广泛的应用中非常成功,但在资源受限制的硬件中实施这些网络仍然是一个密集的研究领域。通过以量化(如4比或二进制)对等物取代神经网络的重量,在计算成本、记忆和电耗方面实现了大量节省。我们修改了培训后神经网络量化方法GPFQ,该方法基于贪婪的路径跟踪机制,严格分析错误。我们证明,对单层网络进行量化时,相对的平方差基本上线性地腐蚀了重量数量,即过度平衡的程度。我们的结果存在于一系列投入分布中,同时具有完全连接和进化的结构。为了对方法进行实验性评估,我们用微小的重量来对几个通用结构进行量化,并在图像网络上测试,仅显示轻微的准确性。我们还证明,标准修改,如偏差校正和混合精准度四分制等,可以进一步提高准确性。