We present a novel technique to estimate the 6D pose of objects from single images where the 3D geometry of the object is only given approximately and not as a precise 3D model. To achieve this, we employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel. In addition to the 3D coordinates, our model also estimates the pixel-wise coordinate error to discard correspondences that are likely wrong. This allows us to generate multiple 6D pose hypotheses of the object, which we then refine iteratively using a highly efficient region-based approach. We also introduce a novel pixel-wise posterior formulation by which we can estimate the probability for each hypothesis and select the most likely one. As we show in experiments, our approach is capable of dealing with extreme visual conditions including overexposure, high contrast, or low signal-to-noise ratio. This makes it a powerful technique for the particularly challenging task of estimating the pose of tumbling satellites for in-orbit robotic applications. Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
翻译:我们提出了一种新的技术,用于从单个图像中估计物体的6自由度姿态,其中物体的3D几何仅以近似方式给出,而不是精确的3D模型。为了实现这一目标,我们采用了一种密集的二维到三维对应度量器,可以为每个像素回归三维模型坐标。除了三维坐标,我们的模型还估计像素级别的坐标误差,以丢弃可能错误的对应关系。这使我们能够生成物体的多个6D姿态假设,然后使用高效的基于区域的方法进行迭代地精炼。我们还介绍了一种新颖的像素级后验概率公式,借此我们可以估计每个假设的概率并选择最有可能的一个。正如我们在实验中展示的那样,我们的方法能够处理极端的视觉条件,包括过度曝光、高对比度或低信噪比。这使它成为特别具有挑战性的任务——为在轨机器人应用中估计旋转卫星的姿态的强大技术。我们的方法在SPEED+数据集上取得了最先进的性能,并赢得了SPEC2021后期比赛。