Few prior 6D pose estimation methods use a backbone network to extract features from RGB and depth images, and Uni6D is the pioneer to do so. We find that primary causes of the performance limitation in Uni6D are Instance-Outside and Instance-Inside noise. Uni6D inevitably introduces Instance-Outside noise from background pixels in the receptive field due to its inherently straightforward pipeline design and ignores the Instance-Inside noise in the input depth data. In this work, we propose a two-step denoising method to handle aforementioned noise in Uni6D. In the first step, an instance segmentation network is used to crop and mask the instance to remove noise from non-instance regions. In the second step, a lightweight depth denoising module is proposed to calibrate the depth feature before feeding it into the pose regression network. Extensive experiments show that our method called Uni6Dv2 is able to eliminate the noise effectively and robustly, outperforming Uni6D without sacrificing too much inference efficiency. It also reduces the need for annotated real data that requires costly labeling.
翻译:最初的 6D 表示估计方法少有几个, 使用主干网从 RGB 和 深度图像中提取特征, Uni6D 是这样做的先驱者。 我们发现, Uni6D 中性能限制的主要原因是外向和内向噪音。 Uni6D 不可避免地在可接受字段中从背景像素中引入外向噪音, 因为它的管道设计本来就是直截了当的, 在输入深度数据中忽略了外向噪音。 在这项工作中, 我们提议了一种分解两步的方法, 来处理Uni6D 中的上述噪音。 在第一步, 一个实例分解网络用于裁剪除和遮盖非深入区域的噪音。 在第二步, 提议一个轻量的深度分解模块来校准深度特性, 然后再将它输入外向外向后方回归网络。 广泛的实验显示, 我们称为 Uni6Dv2 的方法能够有效和有力地消除噪音, 在不牺牲过高的推断效率的情况下, 超额 UN6D 。 它还减少了对需要昂贵的注释的真实数据的需求 。