In this paper, we propose an end-to-end framework for instance segmentation. Based on the recently introduced DETR [1], our method, termed SOLQ, segments objects by learning unified queries. In SOLQ, each query represents one object and has multiple representations: class, location and mask. The object queries learned perform classification, box regression and mask encoding simultaneously in an unified vector form. During training phase, the mask vectors encoded are supervised by the compression coding of raw spatial masks. In inference time, mask vectors produced can be directly transformed to spatial masks by the inverse process of compression coding. Experimental results show that SOLQ can achieve state-of-the-art performance, surpassing most of existing approaches. Moreover, the joint learning of unified query representation can greatly improve the detection performance of original DETR. We hope our SOLQ can serve as a strong baseline for the Transformer-based instance segmentation. Code is available at https://github.com/megvii-research/SOLQ.
翻译:在本文中,我们提议了一个端到端框架,例如分割。根据最近推出的DETR[1],我们的方法,称为 SOLQ,通过学习统一查询,各部分对象。在 SOLQ,每个查询代表一个对象,并有多个表达方式:阶级、位置和面具。所学对象查询同时以统一的矢量形式进行分类、框回归和遮罩编码。在培训阶段,编码的遮罩矢量由原始空间面具的压缩编码监督。推断时间,产生的遮罩矢量可以通过反压缩编码过程直接转换为空间遮罩。实验结果显示SOLQ能够达到最新性能,超过大多数现有方法。此外,联合学习统一查询表示量可以大大改进原DTR的检测性能。我们希望我们的SOLQ能够作为基于变压器的分解的强大基线。代码可在https://github.com/mevi-research/SOLQ上查阅。