Learning-based optical flow estimation has been dominated with the pipeline of cost volume with convolutions for flow regression, which is inherently limited to local correlations and thus is hard to address the long-standing challenge of large displacements. To alleviate this, the state-of-the-art framework RAFT gradually improves its prediction quality by using a large number of iterative refinements, achieving remarkable performance but introducing linearly increasing inference time. To enable both high accuracy and efficiency, we completely revamp the dominant flow regression pipeline by reformulating optical flow as a global matching problem, which identifies the correspondences by directly comparing feature similarities. Specifically, we propose a GMFlow framework, which consists of three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for global feature matching, and a self-attention layer for flow propagation. We further introduce a refinement step that reuses GMFlow at higher feature resolution for residual flow prediction. Our new framework outperforms 31-refinements RAFT on the challenging Sintel benchmark, while using only one refinement and running faster, suggesting a new paradigm for accurate and efficient optical flow estimation. Code is available at https://github.com/haofeixu/gmflow.
翻译:以学习为基础的光学流估计一直以成本量与流量回归的演变过程相联而为主,这必然限于当地的相关性,因此难以应对长期存在的大规模迁移的挑战。为缓解这种情况,先进框架RAFT通过大量迭接改进逐步提高其预测质量,取得显著的性能,但引入了线性增加的推论时间。为了能够提高准确性和效率,我们完全改造了主导流量回归管道,将光流作为全球匹配问题,通过直接比较特征相似点来查明对应点。具体地说,我们提议了一个GMFlow框架,由三个主要部分组成:一个定制的功能增强变异器,一个用于全球特征匹配的关联和软模层,以及一个流动传播的自我注意层。我们进一步引入了一个改进步骤,将GMFlow再利用更高的特征分辨率进行剩余流量预测。我们的新框架在挑战性的Sintel基准上比31项更精确,而RAFT要精确地精炼,同时仅使用一个改进和运行得更快,为精确和高效的光学流估计提供了新的范例。