We examine the question of whether SGD-based optimization of deep neural networks (DNNs) can be adapted to produce models which are both highly-accurate and easily-compressible. We propose a new compression-aware minimizer dubbed CrAM, which modifies the SGD training iteration in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as weight pruning or quantization. Experimental results on standard image classification tasks show that CrAM produces dense models that can be more accurate than standard SGD-type baselines, but which are surprisingly stable under weight pruning: for instance, for ResNet50 on ImageNet, CrAM-trained models can lose up to 70% of their weights in one shot with only minor accuracy loss.
翻译:我们研究基于SGD的深神经网络优化(DNNs)是否可以适应于生产高度准确和容易压缩的模型的问题。 我们提议一个新的压缩-敏锐最小化器,称为CrAM,以有原则的方式修改SGD培训的迭代,以便产生当地损失行为稳定在压缩操作(如重量倾斜或量化)之下的模型。 标准图像分类任务的实验结果表明,CrAM生成的密度模型可能比标准SGD类型基线更准确,但令人惊讶地稳定在重量调整之下的模型:例如,在图像网络上的ResNet50,经过CrAM培训的模型可能会在一次镜头中损失高达70%的重量,而精确度损失很小。