We introduce DiscoBox, a novel framework that jointly learns instance segmentation and semantic correspondence using bounding box supervision. Specifically, we propose a self-ensembling framework where instance segmentation and semantic correspondence are jointly guided by a structured teacher in addition to the bounding box supervision. The teacher is a structured energy model incorporating a pairwise potential and a cross-image potential to model the pairwise pixel relationships both within and across the boxes. Minimizing the teacher energy simultaneously yields refined object masks and dense correspondences between intra-class objects, which are taken as pseudo-labels to supervise the task network and provide positive/negative correspondence pairs for dense constrastive learning. We show a symbiotic relationship where the two tasks mutually benefit from each other. Our best model achieves 37.9% AP on COCO instance segmentation, surpassing prior weakly supervised methods and is competitive to supervised methods. We also obtain state of the art weakly supervised results on PASCAL VOC12 and PF-PASCAL with real-time inference.
翻译:我们引入了DiscoBox, 这是一种利用捆绑框监督联合学习实例分割和语义通信的新框架。 具体地说, 我们提出一个自组框架, 由结构化教师共同指导, 除了捆绑框监督之外, 则由结构化教师共同指导 。 教师是一个结构化的能源模型, 包含一种双向潜力和交叉图像, 以模拟盒内和箱外的对称像素关系 。 最大限度地减少教师的能量, 同时产生精细的物件面具和同类对象之间密集的通信, 以假标签形式监督任务网络, 并提供正对称/负式通信配对, 供密集的相互学习 。 我们展示了一种共生关系, 两种任务相互受益 。 我们的最佳模型在COCO分解上实现了37.9%的AP, 超过了先前薄弱的监管方法, 并且具有受监督的方法的竞争力。 我们还获得了关于PASAL VOC12 和PF- PASCAL 实时推导的艺术低监管结果。