Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the DNN parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DKM delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 ImageNet1k accuracy on ResNet50 DNN model with 3.3MB model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 62.8% top-1 ImageNet1k accuracy with 0.74 MB model size (22.4x model compression factor). This result is 6.8% higher top-1accuracy and 33% relatively smaller model size than the current state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1%) accuracy loss on GLUE NLP benchmarks.
翻译:深神经网络(DNNN) 用于高效构件推断的深神经网络(DNN) 模型压缩正在变得越来越重要, 以减少记忆要求, 并保持用户在设计上的数据。 为此, 我们提出一个新的可变的 k- means 群集层( DKM), 并将其应用于基于培训的重量组群 DNN 模型压缩。 DKM 将 k- 群集作为关注问题, 并能够联合优化 DNN 参数和集成中子体。 不同于以前依赖额外规范器和参数的工程, DKM 压缩保持了原始损失函数和模型架构的固定。 我们评估了计算机视觉和自然语言处理( NLP) 任务的各种 DNNM 模型的基于 DKM 的压缩。 我们的结果表明, DKMM 在图像网络模型NNN50 DNNNM 模型中, DNNNNM 的最小值 和模型的精确度为74.54% (29.4L ) 压缩系数)。 对于移动网络- v.1, 模型比 模型更具有挑战性 DNNNNNNNMNB 的 DNB 的最小的缩缩略号模型, DK- mL 的缩缩缩缩缩缩成为32的缩缩成的缩成为23的缩成的缩成的缩成的缩成的缩成的缩成的缩成的缩成的缩成的缩成的缩成。