The ability to localize and segment objects from unseen classes would open the door to new applications, such as autonomous object learning in active vision. Nonetheless, improving the performance on unseen classes requires additional training data, while manually annotating the objects of the unseen classes can be labor-extensive and expensive. In this paper, we explore the use of unlabeled video sequences to automatically generate training data for objects of unseen classes. It is in principle possible to apply existing video segmentation methods to unlabeled videos and automatically obtain object masks, which can then be used as a training set even for classes with no manual labels available. However, our experiments show that these methods do not perform well enough for this purpose. We therefore introduce a Bayesian method that is specifically designed to automatically create such a training set: Our method starts from a set of object proposals and relies on (non-realistic) analysis-by-synthesis to select the correct ones by performing an efficient optimization over all the frames simultaneously. Through extensive experiments, we show that our method can generate a high-quality training set which significantly boosts the performance of segmenting objects of unseen classes. We thus believe that our method could open the door for open-world instance segmentation using abundant Internet videos.
翻译:本地化和从隐蔽类中分割对象的能力将打开新应用的大门,例如自主对象在积极视觉中学习。 然而,改进隐蔽类中的性能需要额外的培训数据,而人工说明隐蔽类中对象的手动说明可以是劳动的宽度和昂贵的。 在本文中,我们探索使用未贴标签的视频序列自动生成对隐蔽类中对象的培训数据。原则上,可以对无标签的视频应用现有的视频分解方法,并自动获取对象面具,然后甚至可以用作没有手动标签的班级的培训组。然而,我们的实验表明,这些方法效果并不足以满足这一目的。因此,我们引入了一种专门设计来自动创建这种培训组的贝叶斯式方法:我们的方法从一组对象建议开始,依靠(非现实的)分析合成法来选择正确的数据,同时对所有框架进行高效的优化。我们通过广泛的实验表明,我们的方法可以产生高质量的培训组,大大地提升了对隐蔽类中对象的功能。 我们因此相信,我们的方法可以打开门段。