Humans can easily segment moving objects without knowing what they are. That objectness could emerge from continuous visual observations motivates us to model grouping and movement concurrently from unlabeled videos. Our premise is that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis which can be checked from the data itself without any external supervision. Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images. It then binds them in a conjoint representation called segment flow that pools flow offsets over each region and provides a gross characterization of moving regions for the entire scene. By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively. Our model demonstrates the surprising emergence of objectness in the appearance pathway, surpassing prior works on zero-shot object segmentation from an image, moving object segmentation from a video with unsupervised test-time adaptation, and semantic image segmentation by supervised fine-tuning. Our work is the first truly end-to-end zero-shot object segmentation from videos. It not only develops generic objectness for segmentation and tracking, but also outperforms prevalent image-based contrastive learning methods without augmentation engineering.
翻译:人类可以轻松地分割移动对象而不知道它们是什么。 从连续的视觉观察中可以产生目标性, 从而激励我们同时从未贴标签的视频中进行分组和移动。 我们的前提是视频对通过移动组件相关的同一场景有不同的观点, 正确的区域分割和区域流将允许相互查看合成, 可以在没有外部监督的情况下从数据本身中检查。 我们的模型从两个不同的路径开始: 一种外观路径, 将单个图像输出基于特征的区域分割, 另一种图像输出输出为一对图像的动作。 然后, 将它们绑在一起, 称为同步代表流, 将每个区域聚集在一起, 并为整个场景提供一个移动区域的总体特征。 通过培训模型, 最大限度地减少基于部分流动的合成错误, 我们的外观和运动路径可以学习区域分割和自动的流程估计, 而不必分别从低层边缘或光学流建立它们。 我们的模型显示了在外观路径中对象的惊人的出现, 超越了以前对零点对象分割的图象的作品, 将对象分割从一个不固定的图像移动到不固定的图像, 并且不进行常规的测试- 将常规的平流调整, 我们的平流路段段段段进行真正的平整。