Semantic, instance, and panoptic segmentations have been addressed using different and specialized frameworks despite their underlying connections. This paper presents a unified, simple, and effective framework for these essentially similar tasks. The framework, named K-Net, segments both instances and semantic categories consistently by a group of learnable kernels, where each kernel is responsible for generating a mask for either a potential instance or a stuff class. To remedy the difficulties of distinguishing various instances, we propose a kernel update strategy that enables each kernel dynamic and conditional on its meaningful group in the input image. K-Net can be trained in an end-to-end manner with bipartite matching, and its training and inference are naturally NMS-free and box-free. Without bells and whistles, K-Net surpasses all previous published state-of-the-art single-model results of panoptic segmentation on MS COCO test-dev split and semantic segmentation on ADE20K val split with 55.2% PQ and 54.3% mIoU, respectively. Its instance segmentation performance is also on par with Cascade Mask R-CNN on MS COCO with 60%-90% faster inference speeds. Code and models will be released at https://github.com/ZwwWayne/K-Net/.
翻译:语义、 实例 和 光学部分 已经通过不同且专门的框架解决了 。 本文为这些基本相似的任务提供了一个统一、 简单、 有效的框架 。 K- Net 框架, 称为 K- Net, 由一组可学习的内核 一致, 每个内核都负责为潜在的实例或某类物质生成一个掩码。 为了克服区分各种实例的困难, 我们提议了一个内核更新战略, 使每个内核都能够动态并以输入图像中有意义的群体为条件。 K- Net 可以用双方匹配的方式以端对端方式进行培训, K- Net 及其培训和推断自然是无NMS 和无箱的。 没有铃声和哨, K- Net 将超过以前出版的关于MS CO 测试- dev 分解和语系分解的所有状态的单一模型结果 。 在 ASDE20K val 和 PQ 和 54. MICO 上, K- 可以用双端匹配的方式进行培训, 其培训和推断性能 自然是无NMMS- 和 AS- AS- fer% AS- AS- AS- AS- AS- AS- far speaxxxxxxxxxx