In RGB-D based 6D pose estimation, direct regression approaches can directly predict the 3D rotation and translation from RGB-D data, allowing for quick deployment and efficient inference. However, directly regressing the absolute translation of the pose suffers from diverse object translation distribution between the training and testing datasets, which is usually caused by the diversity of pose distribution of objects in 3D physical space. To this end, we generalize the pin-hole camera projection model to a residual-based projection model and propose the projective residual regression (Res6D) mechanism. Given a reference point for each object in an RGB-D image, Res6D not only reduces the distribution gap and shrinks the regression target to a small range by regressing the residual between the target and the reference point, but also aligns its output residual and its input to follow the projection equation between the 2D plane and 3D space. By plugging Res6D into the latest direct regression methods, we achieve state-of-the-art overall results on datasets including Occlusion LineMOD (ADD(S): 79.7%), LineMOD (ADD(S): 99.5%), and YCB-Video datasets (AUC of ADD(S): 95.4%).
翻译:在基于 RGB-D 的 6D 构成估计中, 直接回归法可以直接预测 RGB- D 数据的 3D 旋转和翻译, 从而可以快速部署和有效推断。 但是, 直接回缩 组合的绝对翻转会因培训数据集和测试数据集之间的不同对象翻译分布而受到影响, 通常是由3D 物理空间中物体的构成分布多样性造成的。 为此, 我们将针孔相机投影模型推广到基于残余的投影模型, 并提议投影剩余回归(Res6D)机制。 鉴于 RGB- D 图像中每个对象的参考点, Res6D 不仅缩小分布差距, 并通过将目标点和参考点之间的剩余部分倒退到小范围, 并将输出剩余部分和输入与 2D 平方 和 3D 空间之间的预测方程式相匹配。 通过将 红洞相机投影成最新的直接回归方法, 我们实现了包括Oclusion IMD (ADD): 79.7%)、 线 95(ADDM (ADD): 995 (ADM5) (ADDDDDDDD): 数据: 99.5 (ADDDDD) 5 (ADDD) 5 (ADDDDDDDD) 和95 (ADD) 5 (ADDDDDD) 5 (ADDD) 5 (ADDDDD) 数据:95 (AD) 95 (AD) 5) 5 (ADDDDDDDDDDDDDD) 5 (ADDDDDDDDDDDDDD) 数据 5 (ADDDDDDDDDDDDD) 5) 5) 5 (ADDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD) 5) 。