Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose. However, the general object detection methods they use are ill-suited to handle cluttered scenes, thus producing poor initialization to the subsequent pose network. To address this, we propose a rigidity-aware detection method exploiting the fact that, in 6D pose estimation, the target objects are rigid. This lets us introduce an approach to sampling positive object regions from the entire visible object area during training, instead of naively drawing samples from the bounding box center where the object might be occluded. As such, every visible object part can contribute to the final bounding box prediction, yielding better detection robustness. Key to the success of our approach is a visibility map, which we propose to build using a minimum barrier distance between every pixel in the bounding box and the box boundary. Our results on seven challenging 6D pose estimation datasets evidence that our method outperforms general detection frameworks by a large margin. Furthermore, combined with a pose regression network, we obtain state-of-the-art pose estimation results on the challenging BOP benchmark.
翻译:近期的6D物体位姿估计方法通常先使用对象检测获取2D边界框,然后再计算位姿。然而,它们使用的通用对象检测方法不适合处理杂乱场景,导致对后续位姿网络产生糟糕的初始化。为了解决这个问题,我们提出了一种刚性感知检测方法,利用6D位姿估计中目标物体是刚性的这一事实。这使我们引入一种方法来在训练时从整个可见对象区域中采样正对象区域,而不是从边界框中心简单地随机采样,从而防止物体被遮挡。因此,每个可见的物体部分都可以对最终边界框预测做出贡献,从而获得更好的检测稳健性。我们方法成功的关键是可视化地图,我们建议使用边界框中每个像素与边界的最小阻隔距离来构建这个地图。我们在七个具有挑战性的6D物体位姿估计数据集上的结果表明,我们的方法比通用的检测框架优越得多。此外,与位姿回归网络相结合,我们在具有挑战性的BOP基准测试中获得了最先进的位姿估计结果。