Learning based 6D object pose estimation methods rely on computing large intermediate pose representations and/or iteratively refining an initial estimation with a slow render-compare pipeline. This paper introduces a novel method we call Cascaded Pose Refinement Transformers, or CRT-6D. We replace the commonly used dense intermediate representation with a sparse set of features sampled from the feature pyramid we call OSKFs(Object Surface Keypoint Features) where each element corresponds to an object keypoint. We employ lightweight deformable transformers and chain them together to iteratively refine proposed poses over the sampled OSKFs. We achieve inference runtimes 2x faster than the closest real-time state of the art methods while supporting up to 21 objects on a single model. We demonstrate the effectiveness of CRT-6D by performing extensive experiments on the LM-O and YCBV datasets. Compared to real-time methods, we achieve state of the art on LM-O and YCB-V, falling slightly behind methods with inference runtimes one order of magnitude higher. The source code is available at: https://github.com/PedroCastro/CRT-6D
翻译:基于6D的学习天体是依据计算大型中间表面表示和(或)迭代地改进初步估计方法,以缓慢的化成成形管道来进行初步估计。本文介绍了一种我们称为CRT-6D的新颖方法。我们把常用的密集中间表示器替换为一套稀少的特征,这些特征来自我们称为OSKF(物体表面关键点特征特征)的特质,其中每个元素与一个对象关键点相对应。我们使用轻量的可变变变变变变变变变器,把它们连在一起,对抽样的OSKFs 的配置进行迭代式改进。我们比最接近的艺术方法实时状态快2x,同时用单一模型支持多达21个对象。我们通过对LM-O和YCBV数据集进行广泛的实验来证明CRT-6D的有效性。与实时方法相比,我们在LM-O和YCB-6-V上实现了艺术的状态,我们略微落后于推推推时间运行一个级级级的各种方法。我们可以看到源代码: http://Pes/Drodrodrostrostrom。