Occlusions pose a significant challenge to optical flow algorithms that rely on local evidences. We consider an occluded point to be one that is imaged in the first frame but not in the next, a slight overloading of the standard definition since it also includes points that move out-of-frame. Estimating the motion of these points is extremely difficult, particularly in the two-frame setting. Previous work relies on CNNs to learn occlusions, without much success, or requires multiple frames to reason about occlusions using temporal smoothness. In this paper, we argue that the occlusion problem can be better solved in the two-frame case by modelling image self-similarities. We introduce a global motion aggregation module, a transformer-based approach to find long-range dependencies between pixels in the first image, and perform global aggregation on the corresponding motion features. We demonstrate that the optical flow estimates in the occluded regions can be significantly improved without damaging the performance in non-occluded regions. This approach obtains new state-of-the-art results on the challenging Sintel dataset, improving the average end-point error by 13.6% on Sintel Final and 13.7% on Sintel Clean. At the time of submission, our method ranks first on these benchmarks among all published and unpublished approaches. Code is available at https://github.com/zacjiang/GMA
翻译:对依赖当地证据的光学流动算法来说,排斥对依赖当地证据的光学流算法提出了重大挑战。 我们认为,隐蔽点是一个在第一个框架而不是下一个框架中的形象,标准定义略为超负荷,因为标准定义还包括一些偏离框架的点。 估计这些点的动态极为困难, 特别是在两个框架背景下。 先前的工作依靠CNN学习隐蔽点, 没有多少成功, 或需要多个框架来解释利用时间平滑度进行隔离的原因。 在本文中,我们认为, 隐蔽点问题可以通过建模图像自我相似性在两个框架中更好地解决。 我们引入了一个全球运动集成模块, 一个基于变异器的模型, 以寻找第一个图像中像数之间的长期依赖性, 并在相应的运动特征上进行全球汇总。 我们证明, 隐蔽点地区的光学流估计可以大大改进, 而不会损害非隐蔽区域的业绩。 在Sintel/Clbetro地区, 这个方法在具有挑战性的Sintel-Arcal Dismational 中获得了新的状态结果。 在Sintel-ralal-ral 13级上改进了我们所公布的Sintal-rbal-rass 13 和Sintal-reval 的提交方法。