We propose a three-stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector) that relies on dense correspondences. We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose. Unlike other deep learning methods that are typically restricted to monocular RGB images, we propose a unified deep learning network allowing different imaging modalities to be used (RGB or Depth). Moreover, we propose a novel pose refinement method, that is based on differentiable rendering. The main concept is to compare predicted and rendered correspondences in multiple views to obtain a pose which is consistent with predicted correspondences in all views. Our proposed method is evaluated rigorously on different data modalities and types of training data in a controlled setup. The main conclusions is that RGB excels in correspondence estimation, while depth contributes to the pose accuracy if good 3D-3D correspondences are available. Naturally, their combination achieves the overall best performance. We perform an extensive evaluation and an ablation study to analyze and validate the results on several challenging datasets. DPODv2 achieves excellent results on all of them while still remaining fast and scalable independent of the used data modality and the type of training data
翻译:我们建议采用三阶段6 DoF 对象探测方法,称为 DPODv2( ense Pose 物件探测器) 。 我们将二维对象探测器与密集的通信估计网络和多视角的精细方法结合起来,以估计完整的六度多方构成。 不同于通常仅限于单向 RGB 图像的其他深层次学习方法,我们建议采用一个统一的深层次学习网络,允许使用不同的成像模式( RGB 或深度 ) 。 此外,我们提议一种新型的改进方法,以不同的成像为基础。 我们的主要概念是比较多种观点的预测和提供的信件,以获得与所有观点中预测的通信一致的外观。 我们提议的方法对不同的数据模式和培训数据类型进行了严格的评价。 我们的主要结论是,RGB在函授方面优异,而深度则有利于如果有好的 3D-3D 对应方式( RGB 或深度 ) 。 当然, 它们的组合实现了总体的最佳性能。 我们进行广泛的评价和模拟研究,以分析和验证几个具有挑战性的数据集的结果。 DPODv2 使用的所有数据类型和可迅速使用的数据获得极好的数据。