We propose a simple, yet powerful approach for unsupervised object segmentation in videos. We introduce an objective function whose minimum represents the mask of the main salient object over the input sequence. It only relies on independent image features and optical flows, which can be obtained using off-the-shelf self-supervised methods. It scales with the length of the sequence with no need for superpixels or sparsification, and it generalizes to different datasets without any specific training. This objective function can actually be derived from a form of spectral clustering applied to the entire video. Our method achieves on-par performance with the state of the art on standard benchmarks (DAVIS2016, SegTrack-v2, FBMS59), while being conceptually and practically much simpler. Code is available at https://ponimatkin.github.io/ssl-vos.
翻译:我们为录像中不受监督的物体分割提出了一个简单而有力的方法。 我们引入了一个客观功能, 其最小值代表了输入序列中主要突出对象的面罩。 它只依靠独立图像特征和光学流, 可以通过现成的自监视方法获取。 它与序列的长度相比, 不需要超级像素或宽度, 并且不经过任何具体培训就概括到不同的数据集。 这个客观功能实际上可以来自适用于整个视频的光谱组合形式。 我们的方法在标准基准方面实现了最新水平的成绩( DAVIS2016, SegTrac-v2, FBMS59), 同时在概念上和实际上更加简单。 代码可以在 https://ponimatkin.github.io/sl-vos上查阅。