This paper investigates deep neural network (DNN) compression from the perspective of compactly representing and storing trained parameters. We explore the previously overlooked opportunity of cross-layer architecture-agnostic representation sharing for DNN parameters. To do this, we decouple feedforward parameters from DNN architectures and leverage additive quantization, an extreme lossy compression method invented for image descriptors, to compactly represent the parameters. The representations are then finetuned on task objectives to improve task accuracy. We conduct extensive experiments on MobileNet-v2, VGG-11, ResNet-50, Feature Pyramid Networks, and pruned DNNs trained for classification, detection, and segmentation tasks. The conceptually simple scheme consistently outperforms iterative unstructured pruning. Applied to ResNet-50 with 76.1% top-1 accuracy on the ILSVRC12 classification challenge, it achieves a $7.2\times$ compression ratio with no accuracy loss and a $15.3\times$ compression ratio at 74.79% accuracy. Further analyses suggest that representation sharing can frequently happen across network layers and that learning shared representations for an entire DNN can achieve better accuracy at the same compression ratio than compressing the model as multiple separate parts. We release PyTorch code to facilitate DNN deployment on resource-constrained devices and spur future research on efficient representations and storage of DNN parameters.
翻译:本文从严格代表和储存经过培训的参数的角度,调查深神经网络(DNN)的压缩。 我们探索了以前忽视的跨层结构- 不可知的代表性共享 DNN参数的机会。 为此, 我们从 DNN 架构中分离了跨层结构- 不可知的代表性参数, 并使用为图像描述器发明的极端损失性压缩方法, 以简要代表参数。 然后对任务目标进行微调, 以提高任务的准确性。 我们进行了广泛的实验, 包括移动网络- v2、 VGG- 11、 ResNet- 50、 功能性金字塔网络以及经过分类、 检测和分割任务训练的经修饰的 DNNNN 网络。 概念简单的计划始终超越迭代无结构的运行。 应用到 ResNet- 50, 图像描述器为76.1% 最高1 的精确度, 代表器在IMVRC12 分类挑战中实现了7.2美元压缩率, 压缩率为15.3美元, 压缩率为74.79% 准确性。 进一步分析表明, 代表性在网络层次上经常发生,, 并学习共享的DNNNNER 的共享的配置,,,可以实现整个的版本的解决方案的配置。