In this paper we present Mask DINO, a unified object detection and segmentation framework. Mask DINO extends DINO (DETR with Improved Denoising Anchor Boxes) by adding a mask prediction branch which supports all image segmentation tasks (instance, panoptic, and semantic). It makes use of the query embeddings from DINO to dot-product a high-resolution pixel embedding map to predict a set of binary masks. Some key components in DINO are extended for segmentation through a shared architecture and training process. Mask DINO is simple, efficient, scalable, and benefits from joint large-scale detection and segmentation datasets. Our experiments show that Mask DINO significantly outperforms all existing specialized segmentation methods, both on a ResNet-50 backbone and a pre-trained model with SwinL backbone. Notably, Mask DINO establishes the best results to date on instance segmentation (54.5 AP on COCO), panoptic segmentation (59.4 PQ on COCO), and semantic segmentation (60.8 mIoU on ADE20K). Code will be avaliable at \url{https://github.com/IDEACVR/MaskDINO}.
翻译:在本文中,我们展示了掩码 DINO, 是一个统一的天体探测和分解框架。 掩码 DINO 通过添加一个支持所有图像分解任务( Instance, panvision, and semantic ) 的掩码预测分支, 支持所有图像分解任务( Instance, panvision, and semantictic ), 扩展 DINO 的 DINO 。 它使用 DINO 的查询嵌入到 dot 产品上, 一个高分辨率嵌入像素图, 以预测一套二元面面面罩。 DINO 中的一些关键组件通过一个共享的架构和培训进程扩展为分解。 掩码 DINO 简单、高效、可伸缩, 以及大规模联合检测和分解数据集的好处。 我们的实验显示, 掩码 DINO 明显超越了所有现有的专门分解方法( ResNet- 50 主干线和SwinL 主干线预先训练的模型) 。 显而易见, DINOVADE20 代码将建立实例分解( 60.8 mIOU/MAburKDU) 。 ASU) 。 。 ASU= aval20