完全和微弱监督的 " 端至端学习 " 的表示式分割 (Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning)

Referring Expression Segmentation (RES), which is aimed at localizing and segmenting the target according to the given language expression, has drawn increasing attention. Existing methods jointly consider the localization and segmentation steps, which rely on the fused visual and linguistic features for both steps. We argue that the conflict between the purpose of identifying an object and generating a mask limits the RES performance. To solve this problem, we propose a parallel position-kernel-segmentation pipeline to better isolate and then interact the localization and segmentation steps. In our pipeline, linguistic information will not directly contaminate the visual feature for segmentation. Specifically, the localization step localizes the target object in the image based on the referring expression, and then the visual kernel obtained from the localization step guides the segmentation step. This pipeline also enables us to train RES in a weakly-supervised way, where the pixel-level segmentation labels are replaced by click annotations on center and corner points. The position head is fully-supervised and trained with the click annotations as supervision, and the segmentation head is trained with weakly-supervised segmentation losses. To validate our framework on a weakly-supervised setting, we annotated three RES benchmark datasets (RefCOCO, RefCOCO+ and RefCOCOg) with click annotations.Our method is simple but surprisingly effective, outperforming all previous state-of-the-art RES methods on fully- and weakly-supervised settings by a large margin. The benchmark code and datasets will be released.

翻译：表达式区划( RES) 旨在根据给定语言表达式对目标进行本地化和分区化, 已引起越来越多的关注。现有方法已共同考虑本地化和分区步骤, 这两步都依赖集成的视觉和语言特性。我们争辩说, 识别对象的目的和生成遮罩的目的之间的冲突限制了 RES 的性能。为了解决这个问题, 我们提议了一个平行的定位- 内核分隔管道, 以更好地隔离, 然后对本地化和分区步骤进行互动。在我们管道中, 语言信息不会直接污染分割的视觉特性。具体地说, 本地化步骤将基于引用表达式的图像目标对象本地化和分区化步骤, 然后从本地化步骤获得的视觉内核部分化步骤来引导分解步骤。这个管道还使我们能够以薄弱的监视方式对 RES 进行训练, 以中、角点和角点点的点点设置完全监控和培训, 以点击的显示分解图, 分解头部分结构将用弱的硬化的硬化的硬化的硬化的内置。, 将完全地校略地校正校准我们的的的校正的的的的校正的校正的校正的的的校正的校正的校正的的的校对校对校对的校对的校对校对的校对的的校对的校对的的校对校对校对校对校对校对校对校对的校对校对的校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对