Estimating the 3D pose of an object is a challenging task that can be considered within augmented reality or robotic applications. In this paper, we propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image. We adopt a hybrid pipeline in two stages: data-driven and geometric respectively. The data-driven step consists of a classification CNN to estimate the object 2D location in the image from local patches, followed by a regression CNN trained to predict the 3D location of a set of keypoints in the camera coordinate system. To extract the pose information, the geometric step consists in aligning the 3D points in the camera coordinate system with the corresponding 3D points in world coordinate system by minimizing a registration error, thus computing the pose. Our experiments on the standard dataset LineMod show that our approach is more robust and accurate than state-of-the-art methods. The approach is also validated to achieve a 6 DoF positioning task by visual servoing.
翻译:估计一个对象的 3D 形状是一个具有挑战性的任务, 可以在扩大的现实或机器人应用中加以考虑。 在本文中, 我们提出一个新的方法, 执行 6 DoF 对象时, 从一个 RGB- D 图像中进行估计。 我们采取混合管道, 分为两个阶段: 数据驱动和几何。 数据驱动步骤包括: CNN 分类, 从本地补丁中估计图像中的 2D 位置, 之后是 CNN 回归, 训练 CNN 预测 相机协调系统中一组关键点的 3D 位置 。 为了提取 显示 显示, 几何步骤是将相机协调系统中的 3D 点与世界相应 3D 点 协调系统中的 3D 点相匹配, 从而计算 。 我们在标准 数据集 线Mod 上的实验显示, 我们的方法比 最新技术方法更稳、更准确 。 这种方法也得到验证, 以便通过直观观测实现 6 DoF 定位任务 。