Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the DNN parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 ImageNet1k accuracy on ResNet50 DNN model with 3.3MB model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 63.9% top-1 ImageNet1k accuracy with 0.72 MB model size (22.4x model compression factor). This result is 6.8% higher top-1accuracy and 33% relatively smaller model size than the current state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1%) accuracy loss on GLUE NLP benchmarks.
翻译:深神经网络(DNNN) 用于高效构件推断的深神经网络(DNN) 模型压缩正在变得越来越重要, 以减少记忆要求, 并保持用户在设计上的数据。 为此, 我们提出一个新的可变的 k- means 群集层( DKM), 并将其应用于基于培训的重量组群 DNN 模型压缩。 DKM 将 k- 群集作为关注问题, 并能够联合优化 DNN 参数和集成中子体。 不同于以前依赖额外规范器和参数的工程, DKM 压缩保持了原始损失函数和模型架构的固定。 我们评估了计算机视觉和自然语言处理( NLP) 任务的各种 DNNM 模型的基于 DKM 的压缩。 我们的结果表明, DKMM 提供了在图像Net50 DNNNNM 模型中, 和 33.3MB 模型规模( 29.4x 模型压缩系数) DMNNNW- v1 的精度。 对于MNNNNNNM- 模型来说, 最小的压缩模型比 模型要具有挑战性的 DNNNNNNNNNNW, DNF- silal- dealemalemal- delix 级的缩略度(D- remax 级的缩略度为3- remax) 的缩缩缩缩缩缩缩缩缩缩缩缩缩略为33- b)。