MaX-DiepLab:带面罩变形器的端到端全光截面 (MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers)

We present MaX-DeepLab, the first end-to-end model for panoptic segmentation. Our approach simplifies the current pipeline that depends heavily on surrogate sub-tasks and hand-designed components, such as box detection, non-maximum suppression, thing-stuff merging, etc. Although these sub-tasks are tackled by area experts, they fail to comprehensively solve the target task. By contrast, our MaX-DeepLab directly predicts class-labeled masks with a mask transformer, and is trained with a panoptic quality inspired loss via bipartite matching. Our mask transformer employs a dual-path architecture that introduces a global memory path in addition to a CNN path, allowing direct communication with any CNN layers. As a result, MaX-DeepLab shows a significant 7.1% PQ gain in the box-free regime on the challenging COCO dataset, closing the gap between box-based and box-free methods for the first time. A small variant of MaX-DeepLab improves 3.0% PQ over DETR with similar parameters and M-Adds. Furthermore, MaX-DeepLab, without test time augmentation, achieves new state-of-the-art 51.3% PQ on COCO test-dev set. Code is available at https://github.com/google-research/deeplab2.

翻译：虽然这些子任务是由地区专家处理的,但是它们未能全面解决目标任务。相反,我们的MaX-DepepLab直接用一个面罩变压器预测了等级标签的面罩,并且通过双面配对接受了全光质量损失的激励器。我们的方法简化了目前严重依赖替代子任务和手设计部件的管道管道,这些管道严重依赖替代子任务和手设计部件,如箱检测、非最大抑制、事物合并等。尽管这些子任务由地区专家处理,但我们这些子任务没有被地区专家处理,但是它们未能全面解决目标任务。相比之下,我们的MaX-DepepLab直接用一个面罩变压器来直接预测等级标签的面面罩面具,并且通过两边配对面相匹配,接受全面质量启发性损失。我们的面具变压器使用双路结构,在CNNPCN路径之外引入全球存储路径,允许与任何CNNCN的层直接通信。因此,MaX-DepLab显示在具有挑战性CO数据集的无箱基和无箱式方法之间差距间隔隔隔缝的P-基方法之间差距。一个小变式MaX-De-De-De-De-De-Dex-LI-LI/Dex/Dex-T-T-T-T-T-T-T-T-TB/C-S-T-S-TB 的测试、没有设置 AS-T-TB-S-T-S-T-T-T-T-T-T-T-T-SB-SB-T/C-SB-S-SB-S-S-S-T-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-SD-SA-SA-SA-SA-SA-SD/C-SD/C-S-SD/C-D/C-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-