Unlike indirect methods that usually require time-consuming post-processing, recent deep learning-based direct methods for 6D pose estimation try to predict the 3D rotation and 3D translation from RGB-D data directly. However, direct methods, regressing the absolute translation of the pose, suffer from diverse object translation distribution between training and test data, which is usually caused by expensive data collection and annotation in practice. To this end, we propose a 5D anchor mechanism by defining the anchor with 3D coordinates in the physical space and 2D coordinates in the image plane. Inspired by anchor-based object detection methods, 5D anchor regresses the offset between the target and anchor, which eliminates the distribution gap and transforms the regression target to a small range. But regressing offset leads to the mismatch between the absolute input and relative output. We build an anchor-based projection model by replacing the absolute input with the relative one, which further improves the performance. By plugging 5D anchor into the latest direct methods, Uni6Dv2 and ES6D obtain 38.7% and 3.5% improvement, respectively. Specifically, Uni6Dv2+5D anchor, dubbed Uni6Dv3, achieves state-of-the-art overall results on datasets including Occlusion LineMOD (79.3%), LineMOD (99.5%), and YCB-Video datasets (91.5%), and requires only 10% of training data to reach comparable performance as full data.
翻译:与通常需要耗时后处理的间接方法不同,最近对 6D 进行基于深深学习的直接方法的估算试图直接预测 RGB-D 数据中的 3D 旋转和 3D 翻译。 但是, 直接方法, 使图像的绝对翻转倒退, 受培训和测试数据之间不同对象翻译分布的影响, 通常是昂贵的数据收集和实践中的注解造成的。 为此, 我们提议了一个 5D 锁定机制, 其方法是在物理空间和图像平面的 2D 坐标中用 3D 定位定位点定义3D 坐标。 在基于 锚的物体探测方法的启发下, 5D 锚将目标与锁定之间的抵消, 从而消除分布差距, 并将回归目标转换到小范围。 但是, 递增抵消导致绝对输入和相对输出之间的不匹配。 我们建立一个基于锁定的预测模型, 将绝对输入替换为相对输入, 进一步改进性能。 通过将 5D 锁定最新直接方法, Uni6Dv2 和ES6D 获得38.7% 和3.5% 改进。 具体而言, UI6D+D+D 整个数据定义, 要求实现整个数据- D 包括% CLED IMOD 。