In this paper, we present a conceptually simple, strong, and efficient framework for fully- and weakly-supervised panoptic segmentation, called Panoptic FCN. Our approach aims to represent and predict foreground things and background stuff in a unified fully convolutional pipeline, which can be optimized with point-based fully or weak supervision. In particular, Panoptic FCN encodes each object instance or stuff category with the proposed kernel generator and produces the prediction by convolving the high-resolution feature directly. With this approach, instance-aware and semantically consistent properties for things and stuff can be respectively satisfied in a simple generate-kernel-then-segment workflow. Without extra boxes for localization or instance separation, the proposed approach outperforms the previous box-based and -free models with high efficiency. Furthermore, we propose a new form of point-based annotation for weakly-supervised panoptic segmentation. It only needs several random points for both things and stuff, which dramatically reduces the annotation cost of human. The proposed Panoptic FCN is also proved to have much superior performance in this weakly-supervised setting, which achieves 82% of the fully-supervised performance with only 20 randomly annotated points per instance. Extensive experiments demonstrate the effectiveness and efficiency of Panoptic FCN on COCO, VOC 2012, Cityscapes, and Mapillary Vistas datasets. And it sets up a new leading benchmark for both fully- and weakly-supervised panoptic segmentation. Our code and models are made publicly available at https://github.com/dvlab-research/PanopticFCN
翻译:在本文中,我们为完全和微弱监督的全光层分割提供了一个简单、强大和高效的概念性框架,称为Panpopic FCN。我们的方法旨在在一个统一的全革命管道中代表并预测前景事物和背景材料,这种管道可以完全以点为基础优化,完全或薄弱的监督力可以优化。特别是,Panpopic FCN将每个物体实例或物质类别与拟议的内核生成器进行编码,并通过直接涉及高分辨率特性来作出预测。通过这种方法,在简单的生成内核弱化部分部分中,可以分别满足对事物和事物和事物的感官一致性特性。如果没有额外的箱来进行本地化或实例分离,拟议的方法将超越先前的基于盒子和无线模型,并且效率很高。此外,我们提出了一种基于点的点化说明,即通过直接将高分辨率的光分解特性与高分辨率的特性直接联系起来。拟议的Panpic FCN在2012年版本中也证明它能完全达到高得多的高级性能水平。