Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with reasonable ($\leq 1\%$) accuracy loss, which is competitive with gradual compression methods. Additionally, we show that CrAM produces sparse models which perform well for transfer learning, and that it also works for semi-structured pruning patterns supported by GPU hardware.
翻译:深神经网络( DNNS) 通常需要通过修剪和/或量化压缩, 才能在实际环境中部署。 在这项工作中, 我们提出一个新的压缩- 智能最小化器, 称为CrAM, 以有原则的方式修改优化步骤, 以便产生本地损失行为在压缩操作( 如裁剪) 下保持稳定的模型。 因此, 通过 CrAM 培训的密度模型应该是一个可压缩的后训练, 而不是显著的精确损失。 标准基准的实验结果, 如图像网络分类的剩余网络和语言建模的BERT模型, 显示 CrAM 生成的密度模型比标准 SGD/Adam 基准更准确, 但却在重量计下稳定: 具体地说, 我们可以将模型在一发至70- 80% 的孔中, 合理 (\leq 1 ⁇ $) 的精度损失与渐进压缩方法具有竞争力。 此外, 我们显示, CrAM 生成的稀少模型, 能够很好地进行传输学习,, 并且它也工作于半结构的裁剪剪裁模式。