As one of the successful Transformer-based models in computer vision tasks, SegFormer demonstrates superior performance in semantic segmentation. Nevertheless, the high computational cost greatly challenges the deployment of SegFormer on edge devices. In this paper, we seek to design a lightweight SegFormer for efficient semantic segmentation. Based on the observation that neurons in SegFormer layers exhibit large variances across different images, we propose a dynamic gated linear layer, which prunes the most uninformative set of neurons based on the input instance. To improve the dynamically pruned SegFormer, we also introduce two-stage knowledge distillation to transfer the knowledge within the original teacher to the pruned student network. Experimental results show that our method can significantly reduce the computation overhead of SegFormer without an apparent performance drop. For instance, we can achieve 36.9% mIoU with only 3.3G FLOPs on ADE20K, saving more than 60% computation with the drop of only 0.5% in mIoU
翻译:作为计算机视觉任务中成功的基于变异器模型之一,SegFormer在计算机视觉任务中展示了在语义分离中的优异性能。然而,高计算成本对在边缘设备上部署SegFormer提出了巨大的挑战。在本文中,我们试图设计一个用于高效语义分离的轻重量SegFormerSegFormer 。基于SegFormer 层的神经在不同的图像中表现出巨大差异的观察,我们建议建立一个动态的门式线性线性层,根据输入实例,将最缺乏信息的神经元集中放在其中。为了改进动态的经调整的SegFormer,我们还引入了两阶段知识蒸馏法,将原始教师的知识传输到经调整的学生网络。实验结果表明,我们的方法可以显著降低SegFormer的计算间接费用,而没有明显的性能下降。举例说,我们只能实现36.9%的MIOU,在ADE20K上仅使用3.3G FLOPs,节省60%以上的计算结果,而MIOU中仅下降0.5 %。