Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences. We consider an occluded point to be one that is imaged in the first frame but not in the next, a slight overloading of the standard definition since it also includes points that move out-of-frame. Estimating the motion of these points is extremely difficult, particularly in the two-frame setting. Previous work relies on CNNs to learn occlusions, without much success, or requires multiple frames to reason about occlusions using temporal smoothness. In this paper, we argue that the occlusion problem can be better solved in the two-frame case by modelling image self-similarities. We introduce a global motion aggregation module, a transformer-based approach to find long-range dependencies between pixels in the first image, and perform global aggregation on the corresponding motion features. We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions. This approach obtains new state-of-the-art results on the challenging Sintel dataset, improving the average end-point error by 13.6\% on Sintel Final and 13.7\% on Sintel Clean. At the time of submission, our method ranks first on these benchmarks among all published and unpublished approaches. Code is available at https://github.com/zacjiang/GMA .
翻译:对依赖当地证据的光学流算法来说,封闭性对依赖当地证据的光学流算法提出了重大挑战。 我们认为,隐蔽性点是一个在第一个框架而不是下一个框架中的形象,标准定义略为超负荷,因为标准定义还包括一些偏离框架的点。估计这些点的动态是极其困难的,特别是在两个框架背景下。以前的工作依靠CNN来学习封闭性,没有多少成功,或需要多个框架来解释使用时间平滑度的封闭性。在本文中,我们认为,在两个框架中,隐蔽性点问题可以通过模拟图像的自我相似性而得到更好的解决。我们引入了一个全球运动集成模块,一个基于变异器的模型,以寻找第一个图像的像素之间的长距离依赖性,并在相应的运动特征上进行全球汇总。我们证明,隐蔽地区对光学流的估计可以大大改进,但不会损害非隐蔽性区域的业绩。在Sintel/Clbel区域,在具有挑战性的Sintel数据格式上获得了新的状态-art结果。我们引入了一种基于Sintel-QA的最后标准,在Sintal-Q 13级上改进了我们公布的Sintal-ral 和13级的Sintal-rbal-ral 的提交方法。