6D pose estimation from a single RGB image is a fundamental task in computer vision. The current top-performing deep learning-based methods rely on an indirect strategy, i.e., first establishing 2D-3D correspondences between the coordinates in the image plane and object coordinate system, and then applying a variant of the P$n$P/RANSAC algorithm. However, this two-stage pipeline is not end-to-end trainable, thus is hard to be employed for many tasks requiring differentiable poses. On the other hand, methods based on direct regression are currently inferior to geometry-based methods. In this work, we perform an in-depth investigation on both direct and indirect methods, and propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations. Extensive experiments show that our approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets. Code is available at https://git.io/GDR-Net.
翻译:6D代表对单一 RGB 图像的估算是计算机视觉中的一项基本任务。 目前,以最优秀表现的深层次学习为基础的方法依赖于间接战略,即首先在图像平面和对象坐标系统中建立2D-3D对应,然后应用P$P/RANSAC算法的变体。然而,这一两阶段管道不是端到端可训练的,因此难以用于许多需要不同姿势的任务。另一方面,基于直接回归的方法目前低于基于几何方法的方法。在这项工作中,我们对直接和间接方法进行深入调查,并提议一个简单而有效的测地制导直接反射网络(GDR-Net),以便从密集的通信中间几何表中从端到端学习6D构成。广泛的实验表明,我们的方法明显超越了LM、LM、LM-O和YCB-V 数据集方面的最新方法。代码可在 https://git.io/GDR-Net上查阅。