Current RGB-based 6D object pose estimation methods have achieved noticeable performance on datasets and real world applications. However, predicting 6D pose from single 2D image features is susceptible to disturbance from changing of environment and textureless or resemblant object surfaces. Hence, RGB-based methods generally achieve less competitive results than RGBD-based methods, which deploy both image features and 3D structure features. To narrow down this performance gap, this paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images. Combining the learned 3D information and 2D image features, we establish more stable correspondence between the scene and the object models. To seek for the methods best utilizing 3D information from RGB inputs, we conduct an investigation on three different approaches, including Early- Fusion, Mid-Fusion, and Late-Fusion. We ascertain the Mid- Fusion approach is the best approach to restore the most precise 3D keypoints useful for object pose estimation. The experiments show that our method outperforms state-of-the-art RGB-based methods, and achieves comparable results with RGBD-based methods.
翻译:目前基于 RGB 的 6D 对象的估算方法在数据集和真实世界应用上取得了显著的绩效。然而,从单一 2D 图像特征中预测 6D 形状很容易受到环境变化以及无纹理或相像物体表面的干扰。因此,基于 RGB 的方法一般比基于 RGBD 的方法(即部署图像特征和3D 结构特征的RGBD 方法) 的竞争力较低。为了缩小这一绩效差距,本文件建议为 6D 对象提供一个框架,从 2 RGB 图像中学习隐含的 3D 信息。结合了所学的 3D 信息 和 2D 图像特征,我们在现场和对象模型之间建立了更稳定的通信。为了寻求最佳利用 RGB 投入中的 3D 信息的方法,我们对三种不同方法进行了调查,包括早期放大、 中浮学和后振动。我们确定中振法是恢复最精确的 3D 关键点用于对象的估测算的最佳方法。实验表明,我们的方法优于基于 RGB 方法的状态和可比较的结果。