Commonly used backbones for semantic segmentation, such as ResNet and Swin-Transformer, have multiple stages for feature encoding. Simply using high-resolution low-level feature maps from the early stages of the backbone to directly refine the low-resolution high-level feature map is a common practice of low-resolution feature map upsampling. However, the representation power of the low-level features is generally worse than high-level features, thus introducing ``noise" to the upsampling refinement. To address this issue, we proposed High-level Feature Guided Decoder (HFGD), which uses isolated high-level features to guide low-level features and upsampling process. Specifically, the guidance is realized through carefully designed stop gradient operations and class kernels. Now the class kernels co-evolve only with the high-level features and are reused in the upsampling head to guide the training process of the upsampling head. HFGD is very efficient and effective that can also upsample the feature maps to a previously unseen output stride (OS) of 2 and still obtain accuracy gain. HFGD demonstrates state-of-the-art performance on several benchmark datasets (e.g. Pascal Context, COCOStuff164k and Cityscapes) with small FLOPs. The full code will be available at https://github.com/edwardyehuang/HFGD.git.
翻译:通常使用的语义分解主干网,如ResNet和Swin-Transext等,具有多个特性编码阶段。仅仅使用从主干网早期阶段到直接完善低分辨率高地貌图,就使用高分辨率低地貌图,直接完善低分辨率高地貌图,这是低分辨率地貌地图的常见做法。然而,低层特征的表示力通常比高层次特征更差,因此在高层次的改进中引入“噪声”。为解决这一问题,我们建议高层次地貌引导分解仪(HFGD)使用孤立的高层次特征来指导低层次特征和上层取样程序。具体地说,指南是通过精心设计的梯度操作和类内核图来实现的。现在,等级内核的共流只是高层次特征,在高层次头中再利用,以指导高层次头部的改进。高层次的GMDGD(OS)和高层次的GMD(CO-C-SG-SG-C-C-CSD)系统图将具有一定的准确性能。在二楼/SG-CL-CMD-C-C-CSDGD-C-C-CLD-C-CLD-C-C-CLDGDGDMD-C-C-C-C-C-C-C-SD-C-C-C-SD-C-C-SDGDGDGD-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C-C</s>