The balance between high accuracy and high speed has always been a challenging task in semantic image segmentation. Compact segmentation networks are more widely used in the case of limited resources, while their performances are constrained. In this paper, motivated by the residual learning and global aggregation, we propose a simple yet general and effective knowledge distillation framework called double similarity distillation (DSD) to improve the classification accuracy of all existing compact networks by capturing the similarity knowledge in pixel and category dimensions, respectively. Specifically, we propose a pixel-wise similarity distillation (PSD) module that utilizes residual attention maps to capture more detailed spatial dependencies across multiple layers. Compared with exiting methods, the PSD module greatly reduces the amount of calculation and is easy to expand. Furthermore, considering the differences in characteristics between semantic segmentation task and other computer vision tasks, we propose a category-wise similarity distillation (CSD) module, which can help the compact segmentation network strengthen the global category correlation by constructing the correlation matrix. Combining these two modules, DSD framework has no extra parameters and only a minimal increase in FLOPs. Extensive experiments on four challenging datasets, including Cityscapes, CamVid, ADE20K, and Pascal VOC 2012, show that DSD outperforms current state-of-the-art methods, proving its effectiveness and generality. The code and models will be publicly available.
翻译:高精度和高速度之间的平衡始终是语义图像分割中一项具有挑战性的任务。 契约分割网络在资源有限的情况下被更广泛地使用,而其性能则受到限制。 在本文中,基于剩余学习和全球汇总,我们提出了一个简单而普遍和有效的知识蒸馏框架,称为双相似蒸馏(DSD),目的是通过捕捉像素和类别层面的相似知识,提高所有现有紧凑网络的分类准确性。具体地说,我们建议采用一个像素和类别层面的相似性蒸馏(PSD)模块,该模块利用残余关注分布图来捕捉多层之间更为详细的空间依赖性。与前期方法相比,私营部门司模块大大减少了计算数量,且易于扩展。此外,考虑到语义分割任务和其他计算机愿景任务之间的特点差异,我们建议采用一个类别相似性蒸馏(CSD)模块,该模块有助于契约分割网络通过构建关联性矩阵加强全球类别的相关性。 将这两个模块合并起来, DSDF没有额外的参数,只有最低限度的增长,而只是FSD-SD-CSDSD, 和VL-SAL-SD 展示其当前版本。