Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct network quantization based on the pruned model. However, this strategy may ignore that they would affect each other and thus performing them separately may lead to sub-optimal performance. To address this, performing pruning and quantization jointly is essential. Nevertheless, how to make a trade-off between pruning and quantization is non-trivial. Moreover, existing compression methods often rely on some pre-defined compression configurations. Some attempts have been made to search for optimal configurations, which however may take unbearable optimization cost. To address the above issues, we devise a simple yet effective method named Single-path Bit Sharing (SBS). Specifically, we first consider network pruning as a special case of quantization, which provides a unified view for pruning and quantization. We then introduce a single-path model to encode all candidate compression configurations. In this way, the configuration search problem is transformed into a subset selection problem, which significantly reduces the number of parameters, computational cost and optimization difficulty. Relying on the single-path model, we further introduce learnable binary gates to encode the choice of bitwidth. By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined. Extensive experiments on both CIFAR-100 and ImageNet show that SBS is able to significantly reduce computational cost while achieving promising performance. For example, our SBS compressed MobileNetV2 achieves 22.6x Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.
翻译:网络修剪和量化被证明是深层模型压缩的有效方法。 为了获得高度紧凑的模型, 多数方法先是进行网络修剪, 然后是根据经修剪的模型进行网络修剪。 但是, 这个战略可能忽略它们相互影响, 从而分别执行, 可能会导致亚最佳性性能。 要解决这个问题, 联合修剪和量化是必要的。 然而, 如何在裁剪和量化之间进行权衡是非三重性的。 此外, 现有的压缩方法往往依靠某些预设的压缩配置。 已经尝试过一些尝试来寻找最佳的 OP 高级 BS 的精确配置, 但是这可能会带来无法承受的优化成本。 然而, 为了解决上述问题, 我们设计了一个简单而有效的方法, 叫做单一方式的BB 共享。 具体地说, 我们首先将网络的运行作为一个特殊的二次裁剪裁的特例, 这为正性与四分立和四分立。 然后我们引入一个单一的模型来编码所有的候选人压缩配置。 这样, 配置搜索问题就变成了一个子级级的OI 级的精确的精确的计算, 但是, 将让我们的精细化的精化的精度测试, 。