Imposing consistency through proxy tasks has been shown to enhance data-driven learning and enable self-supervision in various tasks. This paper introduces novel and effective consistency strategies for optical flow estimation, a problem where labels from real-world data are very challenging to derive. More specifically, we propose occlusion consistency and zero forcing in the forms of self-supervised learning and transformation consistency in the form of semi-supervised learning. We apply these consistency techniques in a way that the network model learns to describe pixel-level motions better while requiring no additional annotations. We demonstrate that our consistency strategies applied to a strong baseline network model using the original datasets and labels provide further improvements, attaining the state-of-the-art results on the KITTI-2015 scene flow benchmark in the non-stereo category. Our method achieves the best foreground accuracy (4.33% in Fl-all) over both the stereo and non-stereo categories, even though using only monocular image inputs.
翻译:通过代用任务强制实现一致性,已经表明可以加强数据驱动的学习,使各种任务能够自我监督。本文件介绍了光学流量估计的新而有效的一致性战略,这是一个现实世界数据标签极难获取的问题。更具体地说,我们建议以半监督学习的形式,以自我监督学习和转变一致性的形式,在自我监督学习和转变一致性的形式,实现包容性和零强制。我们运用这些一致性技术,使网络模型学会更好地描述像素级运动,而不需要附加说明。我们证明,我们的一致性战略适用于使用原始数据集和标签的强大基线网络模型,提供了进一步的改进,达到了非立体类别KITTI2015场景流动基准的最新结果。我们的方法在立体和非立体类别上都实现了最佳的表面准确度(Fl-all为4.33% ), 尽管只使用单色图像输入。