Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task. Recent studies have shown the great potential of dense correspondence-based solutions, yet improvements are still needed to reach practical deployment. In this paper, we propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects. Firstly, CheckerPose densely samples 3D keypoints from the surface of the 3D object and finds their 2D correspondences progressively in the 2D image. Compared to previous solutions that conduct dense sampling in the image space, our strategy enables the correspondence searching in a 2D grid (i.e., pixel coordinate). Secondly, for our 3D-to-2D correspondence, we design a compact binary code representation for 2D image locations. This representation not only allows for progressive correspondence refinement but also converts the correspondence regression to a more efficient classification problem. Thirdly, we adopt a graph neural network to explicitly model the interactions among the sampled 3D keypoints, further boosting the reliability and accuracy of the correspondences. Together, these novel components make our CheckerPose a strong pose estimation algorithm. When evaluated on the popular Linemod, Linemod-O, and YCB-V object pose estimation benchmarks, CheckerPose clearly boosts the accuracy of correspondence-based methods and achieves state-of-the-art performances.
翻译:从单个RGB图像中估计刚体对象的6自由度姿态是一项至关重要但具有挑战性的任务。最近的研究表明了密集对应解决方案的巨大潜力,但仍需要进一步提高以实现实际部署。在本文中,我们提出了一种名为CheckerPose的新的姿态估计算法,它在三个主要方面得到了改进。首先,CheckerPose从3D对象表面密集采样3D关键点,并在2D图像中逐步找到它们的2D对应点。与先前在图像空间中进行密集采样的解决方案相比,我们的策略使得在2D网格(即像素坐标)中搜索对应变得可能。其次,对于我们的3D到2D对应关系,我们为2D图像位置设计了紧凑的二进制代码表示。这种表达不仅允许渐进式对应关系的提高,还将对应关系回归转换为更有效的分类问题。第三,我们采用了图神经网络来明确建模样本3D关键点之间的相互作用,进一步提高了对应关系的可靠性和准确性。所有这些新颖的组件使我们的CheckerPose成为强大的姿态估计算法。在受欢迎的Linemod,Linemod-O和YCB-V对象姿态估计基准测试中进行评估时,CheckerPose明显提高了对应关系的准确性,并实现了最先进的性能。