具有可预见担保的神经网络的量化 (Post-training Quantization for Neural Networks with Provable Guarantees)

While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation cost, memory, and power consumption are attained. To that end, we generalize a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. Among other things, we propose modifications to promote sparsity of the weights, and rigorously analyze the associated error. Additionally, our error analysis expands the results of previous work on GPFQ to handle general quantization alphabets, showing that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i.e., level of over-parametrization. Our result holds across a range of input distributions and for both fully-connected and convolutional architectures thereby also extending previous results. To empirically evaluate the method, we quantize several common architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy compared to unquantized models. We also demonstrate that standard modifications, such as bias correction and mixed precision quantization, further improve accuracy.

翻译：虽然神经网络在广泛的应用中取得了显著的成功,但是在资源限制的硬件中实施这些网络仍然是一个密集的研究领域。此外,我们的错误分析扩大了GPFQ以往处理一般四分化字母表的工作结果,表明对单层网络进行四分化,相对的平方错误基本上在重量数上直线衰减 -- -- 即超分化的程度。我们的结果存在于一系列投入分布中,完全相连的和进化的结构也因此扩大了先前的结果。为了对方法进行实证评估,我们量化了GPFQ以往处理一般四分化字母表的工作结果,表明对单层网络进行四分化,相对的平方错误基本上使重量数(即超分化的程度)下降。我们的结果存在于各种投入分布中,同时也是为了全面连接和进化结构,从而也扩大了先前的结果。为了对方法进行实证评估,我们量化了GPFQQ处理一般四分式字母表的工作结果,表明对于单层网络的量化,相对的正方差差差基本上使一些共同结构的精确性得到改进。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【ICLR2020】用实对二进制卷积训练二进制神经网络，Training Binary Neural Networks with Real-to-Binary Convolutions

专知会员服务

26+阅读 · 2020年3月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日