Panoptic segmentation involves a combination of joint semantic segmentation and instance segmentation, where image contents are divided into two types: things and stuff. We present Panoptic SegFormer, a general framework for panoptic segmentation with transformers. It contains three innovative components: an efficient deeply-supervised mask decoder, a query decoupling strategy, and an improved post-processing method. We also use Deformable DETR to efficiently process multi-scale features, which is a fast and efficient version of DETR. Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner. This deep supervision strategy lets the attention modules quickly focus on meaningful semantic regions. It improves performance and reduces the number of required training epochs by half compared to Deformable DETR. Our query decoupling strategy decouples the responsibilities of the query set and avoids mutual interference between things and stuff. In addition, our post-processing strategy improves performance without additional costs by jointly considering classification and segmentation qualities to resolve conflicting mask overlaps. Our approach increases the accuracy 6.2\% PQ over the baseline DETR model. Panoptic SegFormer achieves state-of-the-art results on COCO test-dev with 56.2\% PQ. It also shows stronger zero-shot robustness over existing methods. The code is released at \url{https://github.com/zhiqi-li/Panoptic-SegFormer}.
翻译:光学部分包括联合语义分解和实例分解的组合,图像内容分为两种类型:事物和材料。我们介绍Panopic SegFormer,这是用于变压器全光分解的一般框架。它包含三个创新组成部分:高效的、受严密监督的面罩解码器,查询脱钩战略,以及改进后处理方法。我们还使用变形的DETR来高效处理多级特性,这是DETR的快速和高效版本。具体地说,我们以分层的方式监督遮罩解码器中的注意模块。这种深入的监督战略使关注模块能够快速关注有意义的语义区域。它改进了业绩,减少了所需培训的频率,比可变形的 DETR少一半。我们的解析战略分解了查询设置的责任,避免了事物和事物之间的相互干扰。此外,我们的后处理战略通过共同考虑分类和分解质量来解决相互冲突的面罩重叠来提高绩效。我们的方法提高了准确性 6.2 PQ-SEGSER的准确性、SEG-SI-SIR 和SIRSEBSBSU-SI-SI-SBR_SIG_BAR_B_B_BAR_Q_BAR_SI_BAR_BAR_BAR_BAR_BAR_Q_Q_BAR_BAR_Q_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_C_BAR_BAR_BAR_C_BAR_BAR_B_BAR_BAR__BAR__BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_BAR_B_BAR_S_