We consider the problem of category-level 6D pose estimation from a single RGB image. Our approach represents an object category as a cuboid mesh and learns a generative model of the neural feature activations at each mesh vertex to perform pose estimation through differentiable rendering. A common problem of rendering-based approaches is that they rely on bounding box proposals, which do not convey information about the 3D rotation of the object and are not reliable when objects are partially occluded. Instead, we introduce a coarse-to-fine optimization strategy that utilizes the rendering process to estimate a sparse set of 6D object proposals, which are subsequently refined with gradient-based optimization. The key to enabling the convergence of our approach is a neural feature representation that is trained to be scale- and rotation-invariant using contrastive learning. Our experiments demonstrate an enhanced category-level 6D pose estimation performance compared to prior work, particularly under strong partial occlusion.
翻译:我们从一个 RGB 图像中考虑类别6D 代表估计问题。 我们的方法代表了一个对象类别,作为一个幼小网块,并学习了每个网状网状顶端激活神经特征的基因模型,以通过不同形状进行估计。 基于构建的方法的一个常见问题是,它们依赖捆绑框建议,这些提议不传达关于对象三维旋转的信息,当物体部分隐蔽时是不可靠的。相反,我们引入了一个粗略至精细优化战略,利用制作过程来估计一组稀有的六维对象建议,然后通过梯度优化加以完善。使我们的方法能够趋同的关键是神经特征代表,通过对比性学习,经过培训,该代表是规模和旋转的。我们的实验表明,与先前的工作相比,6D 高级类别水平是估算业绩,特别是在强烈的部分封闭下。