Self-supervised, category-agnostic segmentation of real-world images is a challenging open problem in computer vision. Here, we show how to learn static grouping priors from motion self-supervision by building on the cognitive science concept of a Spelke Object: a set of physical stuff that moves together. We introduce the Excitatory-Inhibitory Segment Extraction Network (EISEN), which learns to extract pairwise affinity graphs for static scenes from motion-based training signals. EISEN then produces segments from affinities using a novel graph propagation and competition network. During training, objects that undergo correlated motion (such as robot arms and the objects they move) are decoupled by a bootstrapping process: EISEN explains away the motion of objects it has already learned to segment. We show that EISEN achieves a substantial improvement in the state of the art for self-supervised image segmentation on challenging synthetic and real-world robotics datasets.
翻译:在计算机视觉中,真实世界图像的自我监督、类别不可知分割是一个挑战性的开放问题。在这里,我们展示了如何通过建立Spelke 对象的认知科学概念,从运动自我监督中学习静态组合前科:一组物理材料可以一起移动。我们引入了Excitial-Inhisteal Party Explicationon Network(EISEN),它学会从基于运动的培训信号中提取静态场景的双向亲近图形。 EISEN然后利用一个新颖的图表传播和竞争网络,从亲近中生成部分。在培训过程中,进行相关运动的物体(如机器人武器及其移动的物体)被一个制靴过程分离出来:EISEN将它已经学会的物体的运动解开到段段。我们显示,EISEN在挑战的合成和真实世界机器人数据集方面,在自我监督图像分割的艺术状态上取得了重大改进。