Data association is at the core of many computer vision tasks, e.g., multiple object tracking, image matching, and point cloud registration. Existing methods usually solve the data association problem by network flow optimization, bipartite matching, or end-to-end learning directly. Despite their popularity, we find some defects of the current solutions: they mostly ignore the intra-view context information; besides, they either train deep association models in an end-to-end way and hardly utilize the advantage of optimization-based assignment methods, or only use an off-the-shelf neural network to extract features. In this paper, we propose a general learnable graph matching method to address these issues. Especially, we model the intra-view relationships as an undirected graph. Then data association turns into a general graph matching problem between graphs. Furthermore, to make optimization end-to-end differentiable, we relax the original graph matching problem into continuous quadratic programming and then incorporate training into a deep graph neural network with KKT conditions and implicit function theorem. In MOT task, our method achieves state-of-the-art performance on several MOT datasets. For image matching, our method outperforms state-of-the-art methods with half training data and iterations on a popular indoor dataset, ScanNet. Code will be available at https://github.com/jiaweihe1996/GMTracker.
翻译:数据关联是许多计算机视觉任务的核心,例如多目标跟踪、图像匹配和点云配准。现有的方法通常通过网络流优化、二分图匹配或直接进行端到端的学习来解决数据关联问题。尽管它们很受欢迎,但我们发现当前解决方案存在一些缺陷:它们主要忽略了视图内部的上下文信息;此外,它们要么通过端到端方式训练深度关联模型并几乎不利用基于优化的分配方法的优势,要么仅使用一个现成的神经网络来提取特征。在本文中,我们提出了一种通用的可学习图匹配方法来解决这些问题。特别是,我们将视图内的关系建模为无向图。然后,数据关联变成了两个图之间的一般图匹配问题。此外,为使优化端到端可区分,我们将原始图匹配问题放松为连续二次规划问题,并将训练纳入具有KKT条件和隐式函数定理的深层图神经网络中。在MOT任务中,我们的方法在几个MOT数据集上实现了最先进的性能。对于图像匹配,我们的方法在一个流行的室内数据集ScanNet上使用一半的训练数据和迭代次数超过了最先进的方法。代码将可在 https://github.com/jiaweihe1996/GMTracker 上获得。