We consider the post-training quantization problem, which discretizes the weights of pre-trained deep neural networks without re-training the model. We propose multipoint quantization, a quantization method that approximates a full-precision weight vector using a linear combination of multiple vectors of low-bit numbers; this is in contrast to typical quantization methods that approximate each weight using a single low precision number. Computationally, we construct the multipoint quantization with an efficient greedy selection procedure, and adaptively decides the number of low precision points on each quantized weight vector based on the error of its output. This allows us to achieve higher precision levels for important weights that greatly influence the outputs, yielding an 'effect of mixed precision' but without physical mixed precision implementations (which requires specialized hardware accelerators). Empirically, our method can be implemented by common operands, bringing almost no memory and computation overhead. We show that our method outperforms a range of state-of-the-art methods on ImageNet classification and it can be generalized to more challenging tasks like PASCAL VOC object detection.
翻译:我们考虑了培训后量化问题,它分解了培训前深神经网络的重量,而没有再对模型进行再培训。我们建议多点量化,这是一种使用低位数多个矢量的线性组合,接近全精度重量矢量的量化方法;这与使用单一低精度数字接近每个重量的典型量化方法形成对照。计算上,我们用高效的贪婪选择程序构建多点量化,根据输出错误,适应性地决定每个四分制重量矢量的低精度点数。这使我们能够对影响产出的重要重量达到更高的精确度,产生“混合精度效应”,但没有物理精度执行(这需要专门的硬件加速器 ) 。有规律地说,我们的方法可以通过普通的软体执行,几乎没有记忆和计算间接费用。我们表明,我们的方法在图像网络分类上超越了一系列最先进的标准方法,并且可以普遍化为更具挑战性的任务,如PASAL VOC物体探测。