We present IMAS, a method that segments the primary objects in videos without manual annotation in training or inference. Previous methods in unsupervised video object segmentation (UVOS) have demonstrated the effectiveness of motion as either input or supervision for segmentation. However, motion signals may be uninformative or even misleading in cases such as deformable objects and objects with reflections, causing unsatisfactory segmentation. In contrast, IMAS achieves Improved UVOS with Motion-Appearance Synergy. Our method has two training stages: 1) a motion-supervised object discovery stage that deals with motion-appearance conflicts through a learnable residual pathway; 2) a refinement stage with both low- and high-level appearance supervision to correct model misconceptions learned from misleading motion cues. Additionally, we propose motion-semantic alignment as a model-agnostic annotation-free hyperparam tuning method. We demonstrate its effectiveness in tuning critical hyperparams previously tuned with human annotation or hand-crafted hyperparam-specific metrics. IMAS greatly improves the segmentation quality on several common UVOS benchmarks. For example, we surpass previous methods by 8.3% on DAVIS16 benchmark with only standard ResNet and convolutional heads. We intend to release our code for future research and applications.
翻译:我们提出IMAS,这是在没有人工说明培训或推断的情况下将主要对象分解成视频中的一种方法。以前在未经监督的视频对象分割法(UVOS)中采用的方法已经证明了运动作为分解的输入或监督效果。然而,在诸如变形物体和反射物体等情况中,运动信号可能缺乏信息,甚至误导,造成不令人满意的分解。相比之下,IMAS实现了改进UVOS,即运动-辅助同步。我们的方法有两个培训阶段:1)一个运动监督对象发现阶段,通过可学习的剩余路径处理运动-出现冲突;2)一个精细化阶段,通过低和高层次的外观监督,以纠正从误导运动提示中学到的模范误解。此外,我们建议运动-静态调整作为模型-不具有不令人满意的分解作用的超分解调法方法。我们展示了它在调整先前与人类说明或手制超标的超标的超标的超标。IMS大大改进了通过可学习的剩余路径处理运动-出现冲突;2)一个有低级外观的外观状态监督阶段监督,以纠正从误导性模型从误导运动手法中吸取的模型学质量质量质量,用前的UVIS基准,然后将若干项基准进行。