Conventional model quantization methods use a fixed quantization scheme to different data samples, which ignores the inherent "recognition difficulty" differences between various samples. We propose to feed different data samples with varying quantization schemes to achieve a data-dependent dynamic inference, at a fine-grained layer level. However, enabling this adaptive inference with changeable layer-wise quantization schemes is challenging because the combination of bit-widths and layers is growing exponentially, making it extremely difficult to train a single model in such a vast searching space and use it in practice. To solve this problem, we present the Arbitrary Bit-width Network (ABN), where the bit-widths of a single deep network can change at runtime for different data samples, with a layer-wise granularity. Specifically, first we build a weight-shared layer-wise quantizable "super-network" in which each layer can be allocated with multiple bit-widths and thus quantized differently on demand. The super-network provides a considerably large number of combinations of bit-widths and layers, each of which can be used during inference without retraining or storing myriad models. Second, based on the well-trained super-network, each layer's runtime bit-width selection decision is modeled as a Markov Decision Process (MDP) and solved by an adaptive inference strategy accordingly. Experiments show that the super-network can be built without accuracy degradation, and the bit-widths allocation of each layer can be adjusted to deal with various inputs on the fly. On ImageNet classification, we achieve 1.1% top1 accuracy improvement while saving 36.2% BitOps.
翻译:常规模型量化方法对不同数据样本使用固定的量化方法,这种方法忽视了不同样本之间固有的“识别困难”差异。我们提议以不同的量化方法向不同数据样本提供不同的量化方法,以便在细细细的层层水平上实现数据依赖的动态推断。然而,使这种适应性推论与可改变的层的量化方法具有挑战性,因为比特宽和层的结合正在成倍增长,使得在如此广阔的搜索空间中训练单一模型极为困难,并在实践中使用它。为了解决这个问题,我们介绍了任意的位宽网化网络(ABN),在此情况下,单深网络的比特宽的点在运行时可以改变数据样本的动态动态动态。具体地说,我们首先建立一个权重分摊的层宽度“超级网络”,其中每个层可以使用多个位宽的模型来分配,从而对需求进行不同的量化。超级网络提供了相当大量的比特宽的组合。我们提供了比特维度-维特的网络(ABNe)网络,其中的点维度和层的比维特度可以改变一个运行时间里程中,每个级的精度选择过程可以用来进行。