We address the problem of network quantization, that is, reducing bit-widths of weights and/or activations to lighten network architectures. Quantization methods use a rounding function to map full-precision values to the nearest quantized ones, but this operation is not differentiable. There are mainly two approaches to training quantized networks with gradient-based optimizers. First, a straight-through estimator (STE) replaces the zero derivative of the rounding with that of an identity function, which causes a gradient mismatch problem. Second, soft quantizers approximate the rounding with continuous functions at training time, and exploit the rounding for quantization at test time. This alleviates the gradient mismatch, but causes a quantizer gap problem. We alleviate both problems in a unified framework. To this end, we introduce a novel quantizer, dubbed a distance-aware quantizer (DAQ), that mainly consists of a distance-aware soft rounding (DASR) and a temperature controller. To alleviate the gradient mismatch problem, DASR approximates the discrete rounding with the kernel soft argmax, which is based on our insight that the quantization can be formulated as a distance-based assignment problem between full-precision values and quantized ones. The controller adjusts the temperature parameter in DASR adaptively according to the input, addressing the quantizer gap problem. Experimental results on standard benchmarks show that DAQ outperforms the state of the art significantly for various bit-widths without bells and whistles.
翻译:我们处理网络量化问题, 即降低比特宽重量和(或)激活到较轻的网络结构。 量化方法使用圆形函数将全精度值映射到最近的量化值, 但这个操作是无法区分的。 在以梯度为基础的优化优化器来培训四分化网络方面, 主要是两种方法。 首先, 直通估量(STE) 取代圆形的零衍生出自于一个身份函数, 从而导致梯度错配错问题。 其次, 软估量器在培训时间将圆形函数与连续函数相近, 在测试时间利用圆形函数绘制全精度精度值。 这缓解了梯度错配, 但却造成四分级差差差差差差问题。 我们为此引入了新型的四分解器, 调频度四分解器(DAQ), 主要是以距离感应觉软圆( DASSR) 和温度控制器为主。 为了减轻梯度错错问题, DARSDR 将偏差度的调度定位定位定位定位作为离心度的调度,, 方向的调定值是以整的平整的平差值, 。