目标的出现:从视频中学习零位分割 (The Emergence of Objectness: Learning Zero-Shot Segmentation from Videos)

Humans can easily segment moving objects without knowing what they are. That objectness could emerge from continuous visual observations motivates us to model grouping and movement concurrently from unlabeled videos. Our premise is that a video has different views of the same scene related by moving components, and the right region segmentation and region flow would allow mutual view synthesis which can be checked from the data itself without any external supervision. Our model starts with two separate pathways: an appearance pathway that outputs feature-based region segmentation for a single image, and a motion pathway that outputs motion features for a pair of images. It then binds them in a conjoint representation called segment flow that pools flow offsets over each region and provides a gross characterization of moving regions for the entire scene. By training the model to minimize view synthesis errors based on segment flow, our appearance and motion pathways learn region segmentation and flow estimation automatically without building them up from low-level edges or optical flows respectively. Our model demonstrates the surprising emergence of objectness in the appearance pathway, surpassing prior works on zero-shot object segmentation from an image, moving object segmentation from a video with unsupervised test-time adaptation, and semantic image segmentation by supervised fine-tuning. Our work is the first truly end-to-end zero-shot object segmentation from videos. It not only develops generic objectness for segmentation and tracking, but also outperforms prevalent image-based contrastive learning methods without augmentation engineering.

翻译：人类可以轻松地分割移动对象而不知道它们是什么。从连续的视觉观察中可以产生目标性, 从而激励我们同时从未贴标签的视频中进行分组和移动。我们的前提是视频对通过移动组件相关的同一场景有不同的观点, 正确的区域分割和区域流将允许相互查看合成, 可以在没有外部监督的情况下从数据本身中检查。我们的模型从两个不同的路径开始: 一种外观路径, 将单个图像输出基于特征的区域分割, 另一种图像输出输出为一对图像的动作。然后, 将它们绑在一起, 称为同步代表流, 将每个区域聚集在一起, 并为整个场景提供一个移动区域的总体特征。通过培训模型, 最大限度地减少基于部分流动的合成错误, 我们的外观和运动路径可以学习区域分割和自动的流程估计, 而不必分别从低层边缘或光学流建立它们。我们的模型显示了在外观路径中对象的惊人的出现, 超越了以前对零点对象分割的图象的作品, 将对象分割从一个不固定的图像移动到不固定的图像, 并且不进行常规的测试- 将常规的平流调整, 我们的平流路段段段段进行真正的平整。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日