We propose a novel framework for finding correspondences in images based on a deep neural network that, given two images and a query point in one of them, finds its correspondence in the other. By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings. Importantly, in order to capture both local and global priors, and to let our model relate between image regions using the most relevant among said priors, we realize our network using a transformer. At inference time, we apply our correspondence network by recursively zooming in around the estimates, yielding a multiscale pipeline able to provide highly-accurate correspondences. Our method significantly outperforms the state of the art on both sparse and dense correspondence problems on multiple datasets and tasks, ranging from wide-baseline stereo to optical flow, without any retraining for a specific dataset. We commit to releasing data, code, and all the tools necessary to train from scratch and ensure reproducibility.
翻译:我们提出了一个基于深层神经网络的图像中寻找对应信息的新框架, 这个框架基于深层神经网络, 给出了两个图像和其中的一个查询点, 在其中的一个找到它的对应信息。 通过这样做, 人们可以选择只查询感兴趣的点, 检索稀少的通信, 或者在图像中查询所有点, 并获得密集的绘图。 重要的是, 为了同时捕捉本地和全球的前科, 并且让我们的模型在图像区域之间使用上述前科中最相关的区域, 我们使用变异器, 实现我们的网络 。 在推断时间, 我们应用我们的通信网络, 通过在估计周围循环地放大, 产生一个能够提供高度精确的通信的多尺度管道。 我们的方法大大超越了多个数据集和任务上从宽基立立体到光学流的艺术状态, 我们承诺释放数据、 代码和所有必要的工具, 以便从零到确保重新获得。