Deep neural network architectures have attained remarkable improvements in scene understanding tasks. Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods have been proposed to diminish the heavy computational burden and memory consumption. Among them, the pruning and quantizing methods exhibit a critical drop in performances by compressing the model parameters. While the knowledge distillation methods improve the performance of compact models by focusing on training lightweight networks with the supervision of cumbersome networks. In the proposed method, the knowledge distillation has been performed within the network by constructing multiple branches over the primary stream of the model, known as the self-distillation method. Therefore, the ensemble of sub-neural network models has been proposed to transfer the knowledge among themselves with the knowledge distillation policies as well as an adversarial learning strategy. Hence, The proposed ensemble of sub-models is trained against a discriminator model adversarially. Besides, their knowledge is transferred within the ensemble by four different loss functions. The proposed method has been devoted to both lightweight image classification and encoder-decoder architectures to boost the performance of small and compact models without incurring extra computational overhead at the inference process. Extensive experimental results on the main challenging datasets show that the proposed network outperforms the primary model in terms of accuracy at the same number of parameters and computational cost. The obtained results show that the proposed model has achieved significant improvement over earlier ideas of self-distillation methods. The effectiveness of the proposed models has also been illustrated in the encoder-decoder model.
翻译:深心神经网络架构在现场理解任务方面取得了显著的改进。 使用高效模型是有限资源设备最重要的制约因素之一。 最近,提出了几种压缩方法,以减少沉重的计算负担和内存消耗。 其中, 修剪和量化方法通过压缩模型参数显示性能的显著下降。 虽然知识蒸馏方法通过侧重于培训轻量网络和监管繁琐网络来提高紧凑模型的性能。 在拟议方法中, 知识蒸馏是通过在模型主要流中建立多个分支, 即自蒸馏参数方法。 因此, 提出了几种压缩方法, 以减少计算沉重的计算负担和内存消耗。 因此, 提议的小神经网络模型的合集成性能通过压缩模型培训轻量网络的性能。 拟议的小模型在四个不同的损失函数中转移了它们的知识。 拟议的方法在模型的轻量级图像分类和精度模型的精度模型中, 已经用于小的精度模型的精度, 模拟模型的精度的精度的精度, 模拟模型的精度, 模拟模型的精度的精度, 模拟的精度, 模拟的精度, 模拟模型的精度, 模拟模型的精度, 模拟模型的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟模型的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟, 模拟的精度, 模拟, 模拟的精度, 模拟, 模拟, 模拟, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟, 模拟的精度, 模拟的精度, 模拟的精度, 模拟的精度,