We present Automatic Bit Sharing (ABS) to automatically search for optimal model compression configurations (e.g., pruning ratio and bitwidth). Unlike previous works that consider model pruning and quantization separately, we seek to optimize them jointly. To deal with the resultant large designing space, we propose a novel super-bit model, a single-path method, to encode all candidate compression configurations, rather than maintaining separate paths for each configuration. Specifically, we first propose a novel decomposition of quantization that encapsulates all the candidate bitwidths in the search space. Starting from a low bitwidth, we sequentially consider higher bitwidths by recursively adding re-assignment offsets. We then introduce learnable binary gates to encode the choice of bitwidth, including filter-wise 0-bit for pruning. By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined. Our ABS brings two benefits for model compression: 1) It avoids the combinatorially large design space, with a reduced number of trainable parameters and search costs. 2) It also averts directly fitting an extremely low bit quantizer to the data, hence greatly reducing the optimization difficulty due to the non-differentiable quantization. Experiments on CIFAR-100 and ImageNet show that our methods achieve significant computational cost reduction while preserving promising performance.
翻译:我们提出自动比位共享(ABS) 以自动搜索最佳模式压缩配置(例如,修剪比例和位宽 ) 。 与以往分别考虑模型裁剪和量化的工程不同, 我们试图共同优化它们。 为了处理由此产生的大设计空间, 我们提议了一个新颖的超级比特模型, 一种单一路径方法, 来编码所有候选压缩配置, 而不是为每个配置保留单独的路径。 具体地说, 我们首先提议一个新颖的量化分解, 将搜索空间中所有候选人的位宽包罗起来。 从小点宽开始, 我们按顺序考虑更高的位宽, 反复增加再分配的抵消。 我们然后引入可学习的二进制门来编码比特( bitwith) 的选择, 包括过滤的 0 位 。 通过结合网络参数联合训练二进制门, 每个层的压缩配置可以自动确定。 我们的模型压缩有两个好处 :1) 它避免了轮式大设计空间, 通过反复增加的比特重再增加再分配的比特值, 我们引入一个可选择的平价级的图像, 搜索成本,, 将它直接降低到最低的比值 。